Open superhero-7 opened 1 year ago
Thanks for the feedback! I have updated requirements.txt, and delete the GLIP install (-e git+https://github.com/microsoft/GLIP.git@24ec0ddd8c61534ad5b17e4144864df7003dc7ef#egg=maskrcnn_benchmark), try to " pip install -r requirements.txt ". By the way, torch version I use is 2.0.1. Please let me know if there is any problem.
I see. I noticed you did not use GLIP, so I give up to install GLIP finally. By the way, pytorch 1.12.1 is also support the BLIP and CLIP. So far, I can use your code normally. Thanks for you nice job!
By the way, why not use Grounding DINO as a detector?
Karine-Huang/T2I-CompBench#4 We try multi-modal models, such as miniGPT4, mPlug-Owl, MultiModal-GPT, InternChat, BLIP, may not perform well in spatial understanding. Therefore, a more accurate and intuitive approach like object detection is selected. UniDet is more suitable for the current task because of its strong performance on standard object detection benchmarks, like COCO and PASCAL VOC, makes it a suitable choice for tasks that require accurately detecting a wide range of objects. Other object detection might also be able to accomplish this task.
Thank you! But I'm curious about recent developments, such as whether GroundingDINO would yield better results?
Thanks for the question! The strengths of recent models often lie in their ability to ground visual elements effectively. However, when it comes to challenges in spatial relationships, the key factor often involves fundamental spatial relationship understanding (such as distinguishing the basic position, left, right, etc.). While a powerful detector can contribute to overall results, it's important to note that the primary challenge in spatial relationships still lies in the model's understanding rather than its detection ability. Of course, with more delicate and complex spatial relationships, powerful detectors have the potential to enhance overall performance in visual tasks.
I wonder the which pytorch version should be used. I install python 1.12.1 + cu113,but it fail in GLIP install. Error is: fatal error: THC/THC.h: No such file or directory; Any suggetions? Thanks in advance.