Which model checkpoint is the SOTA on LVIS?

Sense-X / Co-DETR

[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training

MIT License

968 stars 105 forks source link

Which model checkpoint is the SOTA on LVIS? #69

Closed FrancoisPorcher closed 10 months ago

FrancoisPorcher commented 11 months ago

Hi CO-DETR team, thank you for your great work!

I am a little confused on which checkpoint should I use to reproduce the SOTA results on LVIS. In the link you provided it seems that there are different folders and in each folder several models. Could you please clarify this?

Thanks!

TempleX98 commented 11 months ago

@FrancoisPorcher Hi, thank you for your interest in our work.

Our Co-DETR w/ ViT-L model achieves the SOTA results on LVIS. However, we currently have no plans to release this model. Instead, we have released a Swin-L model that achieves a comparable result of 64.5 box AP.
The pretraining and finetuning details of LVIS training can be found in the appendix of our paper. These details may be helpful for your experiments.
We are organizing the LVIS training codes and plan to release them in the near future. Please feel free to contact me if you have any further questions about the implementation.

FrancoisPorcher commented 11 months ago

Thank you for your answer! However I am still confused on which model I should download, is it the last one on this picture?

TempleX98 commented 11 months ago

Yes, co_dino_5scale_lsj_swin_large_16e_o365tolvis.pth is the model that achieves 64.5 box AP on LVIS.

FrancoisPorcher commented 11 months ago

Alright thanks!

And this model has been pre trained on O365 and fine-tuned on LVIS right? But it has never been trained on COCO? Because there are some images of COCO training set that are in LVIS validation set

FrancoisPorcher commented 11 months ago

And also why is the performance given on LVIS val? Isn't it LVIS test that matters? Or maybe the API was broken on the test set? (It has already happened for COCO0

FrancoisPorcher commented 11 months ago

I think the config file indicated for SOTA LVIS lsj is not the right one. The model name mentioned in the config file is different from the one in the google drive above. Would you know more info about that? Thanks!

TempleX98 commented 11 months ago

Alright thanks!

And this model has been pre trained on O365 and fine-tuned on LVIS right? But it has never been trained on COCO? Because there are some images of COCO training set that are in LVIS validation set

Yes, it is only finetuned on LVIS. We also evaluate our model on the LVIS minival set, a subset of the LVIS val that excludes all COCO training images.

TempleX98 commented 11 months ago

And also why is the performance given on LVIS val? Isn't it LVIS test that matters? Or maybe the API was broken on the test set? (It has already happened for COCO0

We just follow the evaluation settings of previous frameworks, such as EVA and ViTDet, to enable clear performance comparisons.

TempleX98 commented 11 months ago

I think the config file indicated for SOTA LVIS lsj is not the right one. The model name mentioned in the config file is different from the one in the google drive above. Would you know more info about that? Thanks!

Sorry, it is the config of the Swin-L model (Co-DETR + SwinL + O365 pretraining + LVIS finetuning). I will correct the config filename.

FrancoisPorcher commented 11 months ago

Okay nice! I can change it myself for now before you release the change since it's just the model path. However is the "base" compatible? Or do we have to change it as well?

FrancoisPorcher commented 11 months ago

Also could you please give a little more information about the backbone? You mentioned a 304 M parameters backbone. Is it EVA02 directly or is it something else? And did you only use the images of O365 for SSL or the bound boxes as well? Thanks a lot!!

TempleX98 commented 11 months ago

Okay nice! I can change it myself for now before you release the change since it's just the model path. However is the "base" compatible? Or do we have to change it as well?

It is compatible.

TempleX98 commented 11 months ago

Also could you please give a little more information about the backbone? You mentioned a 304 M parameters backbone. Is it EVA02 directly or is it something else? And did you only use the images of O365 for SSL or the bound boxes as well? Thanks a lot!!

It is EVA-02. O365 is used for detection pretraining (image+box).