AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.68k stars 453 forks source link

Question about 'lvis_v1_base_class_captions.json' #132

Closed Graysonggg closed 8 months ago

Graysonggg commented 8 months ago

Great job!

I ran into problems when trying to reproduce the fine tuning of the segmentation phase. I did not find the following file: https://github.com/AILab-CVC/YOLO-World/blob/83601a1634276336ddcfd237ba7bbb5b79d86310/configs/segmentation/yolo_world_seg_l_dual_vlpan_2e-4_80e_8gpus_allmodules_finetune_lvis.py#L154

Is it the same as this? https://github.com/AILab-CVC/YOLO-World/blob/83601a1634276336ddcfd237ba7bbb5b79d86310/data/texts/lvis_v1_class_texts.json#L1

Thanks for helping me out.

Graysonggg commented 8 months ago

I have another question that I would like to ask for your advice on. I used your pre-trained segmentation model and the segmentation performance was not very good in my application scenario. Can I re-fine-tune the pre-trained segmentation head by constructing data sets and COCO format annotations to achieve better segmentation effect? Am I on the right track?

wondervictor commented 8 months ago

Hi @Graysonggg, as for your first question: they are not the same and I'll upload the JSON for LVIS-base. The LVIS-base contains only base (c+f) categories of the LVIS dataset and we use it to finetune YOLO-World to detect novel objects (r in LVIS).

wondervictor commented 8 months ago

For the next question, indeed, the segmentation module in YOLO-World now has not been fully tuned. Sure, you can further fine-tune the segmentation modules on your dataset. But I need to provide some instructions for you to do it since we need to modify some code in mmyolo.

wondervictor commented 8 months ago

This issue will be closed since there is no further update related to the main topic. Thanks for your interest. If you have any questions about YOLO-World in the future, you're welcome to open a new issue.