AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.43k stars 430 forks source link

Reproduction problem of FPS #179

Open wennyuhey opened 6 months ago

wennyuhey commented 6 months ago

I export the onnx model of yolow v2 using the huggingface demo, use the lvis categories and remove the NMS by making postprocess_cfg=None. The FPS of yoloworld v2 large is 22.8, lower than the reported FPS 51.0 in paper.

How can I get the reported FPS? Can you provide the scripts to export the model and test the speed?

wondervictor commented 6 months ago

The FPS is measured on V100 without the text encoder and postprocessing NMS, we do not use ONNX model / FP16 to evaluate the inference speed.

wennyuhey commented 6 months ago

@wondervictor Thank you for your response. I’m still having trouble reproducing the FPS. I modified the tools/test.py script for speed testing as follows: `

with open("data/texts/lvis_v1_class_texts.json") as f:
    texts = json.load(f)
texts = [[x[0] for x in texts]]

runner.model.reparameterize(texts)

# start testing
import time
time_list = []
for i in range(200):
    fake_input = torch.zeros((1,3,640,640), device="cuda")
    start = time.time()
    runner.model.predict(fake_input, None)    
    time_list.append(time.time() - start)

total_time = sum(time_list[100:])    
total_iters = len(time_list[100:])
print(f"FPS: {total_iters/total_time}")`

I have also removed the entire post-processing part in YOLOWorldHead.predict_by_feat, stopping the process from line 582. However, the resulting FPS is 29.9, which is incorrect. Is there anything else I need to modify to achieve the correct FPS?