Open WangYushan9264 opened 5 months ago
I don't know whether it's related to the following warning.
image scale was set as 1280ft with clip text encoder. Low it down to 640ft can be helpful but inference speed keeps about 11 FPS. So I am very curious about your evaluation settings for >15 FPS inference speed on LVIS minimal. Thank you.
I am currently in the same situation. I'm using slightly adapted code from the video_demo.py file with the same LVIS data that is provided in the file itself. I'm getting around 3-4 FPS on a P100... on the ultralytics model I was at a constant 18-20 FPS with the same data
I have reconstructed your zero shot test on LVIS minimal, the results can be matching except inference speed can not achieve ~50 FPS or ~15 FPS. I followed your evaluation instruction, using a single 4090 for inference. The speed was about <5 FPS, and I didn't know why. I have changed dist_test.sh into single GPU type, removing lots configs. My command shown as follow: ./tools/dist_test.sh ./configs/pretrain/yolo_world_v2_l_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py ./pretrained_weights/yolo_world_v2_l_obj365v1_goldg_pretrain_1280ft-9babe3f6.pth 1