Open-vocabulary LVIS performance when using all ImageNet-21k classes?

facebookresearch / Detic

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

Apache License 2.0

1.86k stars 211 forks source link

Open-vocabulary LVIS performance when using all ImageNet-21k classes? #39

Closed mjlm closed 2 years ago

mjlm commented 2 years ago

Hi,

Thanks for publishing Detic, very interesting work!

As far as I can tell, the LVIS numbers in the paper were all obtained using image-level data that only contains classes overlapping with LVIS (i.e. "IN-L", or CC captions containing LVIS classes).

What is the LVIS performance when image-level data covering all 22000 ImageNet-21k-classes is used for training? Sorry if this is in the paper and I missed it!

Thanks, Matthias

xingyizhou commented 2 years ago

Hi,

Thank you for your interest! The results are not in the paper. I attached them below:

	LVIS box mAP	LVIS box mAP rare	Objects365 box mAP	OpenImages box mAP50
Box-Supervised	45.0	39.2	19.1	46.2
Detic w. IN-L	46.7	45.1	21.2	53.0
Detic w. IN-21K	45.0	41.2	21.4	55.2

Detic w. IN-21K is worse than Detic w. IN-L on LVIS as expected, due to less-focused vocabulary in training. It still outperforms the Box-supervised baseline on rare classes.

Best, Xingyi

mjlm commented 2 years ago

Thanks for sharing these results!