facebookresearch / Detic

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".
Apache License 2.0
1.86k stars 211 forks source link

Open-vocabulary LVIS performance when using all ImageNet-21k classes? #39

Closed mjlm closed 2 years ago

mjlm commented 2 years ago

Hi,

Thanks for publishing Detic, very interesting work!

As far as I can tell, the LVIS numbers in the paper were all obtained using image-level data that only contains classes overlapping with LVIS (i.e. "IN-L", or CC captions containing LVIS classes).

What is the LVIS performance when image-level data covering all 22000 ImageNet-21k-classes is used for training? Sorry if this is in the paper and I missed it!

Thanks, Matthias

xingyizhou commented 2 years ago

Hi,

Thank you for your interest! The results are not in the paper. I attached them below:

LVIS box mAP LVIS box mAP rare Objects365 box mAP OpenImages box mAP50
Box-Supervised 45.0 39.2 19.1 46.2
Detic w. IN-L 46.7 45.1 21.2 53.0
Detic w. IN-21K 45.0 41.2 21.4 55.2

Detic w. IN-21K is worse than Detic w. IN-L on LVIS as expected, due to less-focused vocabulary in training. It still outperforms the Box-supervised baseline on rare classes.

Best, Xingyi

mjlm commented 2 years ago

Thanks for sharing these results!