AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.4k stars 426 forks source link

Inference Result Reproduction #340

Open nisyad-ms opened 4 months ago

nisyad-ms commented 4 months ago

Dear Authors,

For reproducing the inference results for yolo-world-v2-x @ 1280px on lvis-minival, what is the configuration used? Specifically, the values for 1) nms threshold 2) top_k detection (per img or per cls) k=?

image

Thank you for your great work. :)

wondervictor commented 4 months ago

Hi @nisyad-ms, please use this config: configs/pretrain/yolo_world_v2_x_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py

nisyad-ms commented 4 months ago

Thanks @wondervictor.

I could not find the nms_thres and top_k value in the config.

Context: I would like to reproduce the inference numbers (for sanity check and nothing else) in my own pipeline. I use the demo code you provide to generate all the prediction boxes. I would like to know the nms_thresh and top_k (per image or per class) that you used to generate these results so I can apply these before calculating AP.

wondervictor commented 4 months ago

the nms_thresh is 0.7 and top_k is 300.

nisyad-ms commented 4 months ago

Thanks! you beat me to it. I had to recursively go into the base configs to find them :). I believe for AP Fixed, you used top_k = 1000?

image

wondervictor commented 4 months ago

The topk is set to 300 in the repo (evaluation results of README). While using Fixed AP, we report the results with topk=1000.