AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.64k stars 449 forks source link

the config and weight of online demo #461

Open kaixin-bai opened 3 months ago

kaixin-bai commented 3 months ago

currently i'm using both of the commands below to test yolo-world but got different performance and results with the online demo, i would like to know which config file and weight are used in huggingface online demo.

$ python3 demo/gradio_demo.py ./configs/pretrain/yolo_world_v2_x_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py ./weights/yolo_world_v2_x_obj365v1_goldg_cc3mlite_pretrain_1280ft-14996a36.pth 
python3 demo/gradio_demo.py ./configs/pretrain/yolo_world_xl_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py ./weights/yolo_world_v2_xl_obj365v1_goldg_cc3mlite_pretrain.pth

Screenshot from 2024-08-06 14-20-28 Screenshot from 2024-08-06 14-20-47

kaixin-bai commented 3 months ago

the onnx file exported from gradio web is only 499 bytes, but from online demo is about 416 MB, is there anything wrong?

shiboyang commented 3 months ago

使用export_onnx.py文件导出的模型也遇到了这样的问题

JinhuiYE commented 2 months ago

hi, did you figure out this issue? I met the same problem. the results didn't align with online demo. I use config: YOLO-World/configs/pretrain/yolo_world_v2_x_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py

CLIP: openai/clip-vit-base-patch32

Checkpoint: yolo_world_v2_x_obj365v1_goldg_cc3mlite_pretrain_1280ft-14996a36.pth

text prompt: Trash can, your hand, arm, floor, surface where the trash can was placed, tables, chairs