Open wangzishuo029 opened 1 month ago
No, the performance is limited.
No, the performance is limited.
I am fresh in this area. So i want to ask that when testing on Livis dataset,how to set the classes? Keep the set of training or change it into the classes of livis?
Recent works like YOLO-World and GroundingDINO mainly use Object365 and GoldG for pretraining. These methods do not use CLIP image encoder as the backbone (unlike some open-vocabulary detection methods like CORA and F-VLM). But the vocabulary of O365 dataset is still limited. So can YOLO-World detect objects beyond the pretraining data? Is YOLO-World a really open-vocabulary detector?
最近的开集目标检测器(例如YOLO-World和GroundingDINO)主要是在Object365和GoldG等大规模数据集上预训练。这些方法没有采用CLIP的图像编码器作为backbone,而一些开放词汇目标检测(OVD)方法例如CORA、F-VLM是直接采用的CLIP图像编码器作为backbone。而YOLO-World的预训练数据的词汇虽然更大,但也是有限的。所以YOLO-World能够检测到预训练数据之外的对象吗?是真正的开放词汇目标检测器吗?