AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.61k stars 448 forks source link

Inference speed is low on LVIS minimal zero shot perfomance #312

Open WangYushan9264 opened 5 months ago

WangYushan9264 commented 5 months ago

I have reconstructed your zero shot test on LVIS minimal, the results can be matching except inference speed can not achieve ~50 FPS or ~15 FPS. I followed your evaluation instruction, using a single 4090 for inference. The speed was about <5 FPS, and I didn't know why. I have changed dist_test.sh into single GPU type, removing lots configs. My command shown as follow: ./tools/dist_test.sh ./configs/pretrain/yolo_world_v2_l_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py ./pretrained_weights/yolo_world_v2_l_obj365v1_goldg_pretrain_1280ft-9babe3f6.pth 1

WangYushan9264 commented 5 months ago

I don't know whether it's related to the following warning. image

WangYushan9264 commented 5 months ago

image scale was set as 1280ft with clip text encoder. Low it down to 640ft can be helpful but inference speed keeps about 11 FPS. So I am very curious about your evaluation settings for >15 FPS inference speed on LVIS minimal. Thank you.

Favilludo commented 5 months ago

I am currently in the same situation. I'm using slightly adapted code from the video_demo.py file with the same LVIS data that is provided in the file itself. I'm getting around 3-4 FPS on a P100... on the ultralytics model I was at a constant 18-20 FPS with the same data