Sense-X / Co-DETR

[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training
MIT License
950 stars 100 forks source link

Which EVA-02 checkpoint did you use for SOTA LVIS? #107

Closed FrancoisPorcher closed 5 months ago

FrancoisPorcher commented 7 months ago

Hi Co-DETR team!

Could you give the exact link of the backbone of EVA-02 you used before fine tuning on LVIS please? I am assuming its one of the ones here but I am not sure: https://github.com/baaivision/EVA/tree/master/EVA-02/det

Thanks!

TempleX98 commented 7 months ago

We first use eva02_L_pt_m38m_p14to16 to initialize the vision backbone. Then we train this Co-DETR model on Objects365 dataset for intermediate finetuning. Finally, we finetune this model on the LVIS dataset.

FrancoisPorcher commented 7 months ago

I am a bit confused, what exactly do you call intermediate fine-tuning? Is is a fine tuning of the backbone (without detector) with a MIM objective? Or is it starting with the backbone eva02_L_pt_m38m_p14to16 and initialising detector from scratch, and tracing on Object 365 Bounding boxes? Or something else?

FrancoisPorcher commented 7 months ago

Because from what I understand eva02_L_pt_m38m_p14to16 has already seen Obect365 in the retraining phase according to the EVA02 paper (but only the MIM objective and not the bounding box labels).

Would be great if you could clarify this, it's not easy to keep track of all the subtleties! thanks

TempleX98 commented 7 months ago

I am sorry that my previous answer may mislead you. The first training phase is to train the whole detector (backbone+neck+encoder+decoder+aux branches) on the Objects365 dataset. Specifically, the ViT-L backbone is initialized using eva02_L_pt_m38m_p14to16, while the other components (neck+encoder+decoder+aux branches) are randomly initialized. The supervision signals are derived from the bounding box coordinates and labels.

FrancoisPorcher commented 7 months ago

Yes thank you that makes a lot of sense now! And if we want to initialize from just the eva02_L_pt_m38m_p14to16.pt backbone, would you have any advice? The format is not a .pth, its a .pt, I'm just wondering which script you used to load this backbone and train the detector from scratch

TempleX98 commented 7 months ago

We just change the init_cfg in the backbone config. Here is an example:

backbone=dict(
    type='ViT',
    img_size=1536,
    pretrain_img_size=512,
    patch_size=16,
    embed_dim=1024,
    depth=24,
    num_heads=16,
    mlp_ratio=4*2/3,
    drop_path_rate=0.3,
    window_size=16,
    window_block_indexes=window_block_indexes,
    residual_block_indexes=residual_block_indexes,
    qkv_bias=True,
    use_act_checkpoint=True,
    init_cfg=dict(type='Pretrained', checkpoint='models/eva02_L_pt_m38m_p14to16.pt')),
FrancoisPorcher commented 7 months ago

Great thanks! And last question, for LVIS Sota you sent me the config file and the checkpoint, but I dont have access to the mask head, only the box head, would you have it please?

TempleX98 commented 7 months ago

Great thanks! And last question, for LVIS Sota you sent me the config file and the checkpoint, but I dont have access to the mask head, only the box head, would you have it please?

Sure, I will provide you with the original model. However, I'm currently engaged in several other projects, so it might take a bit of time to organize the original model and code for you. I'll get it to you as soon as possible.

FrancoisPorcher commented 7 months ago

Okay thanks! But it's just the weights and the config file for the mask head no?

Thanks for the help

Cosmo1210 commented 5 months ago

Great thanks! And last question, for LVIS Sota you sent me the config file and the checkpoint, but I dont have access to the mask head, only the box head, would you have it please?

Sure, I will provide you with the original model. However, I'm currently engaged in several other projects, so it might take a bit of time to organize the original model and code for you. I'll get it to you as soon as possible.

Hello CoDETR team!

Im also interested in the instance seg results, would you send me the mask head config and the weights please?

Thanks a lot

TempleX98 commented 5 months ago

Great thanks! And last question, for LVIS Sota you sent me the config file and the checkpoint, but I dont have access to the mask head, only the box head, would you have it please?

Sure, I will provide you with the original model. However, I'm currently engaged in several other projects, so it might take a bit of time to organize the original model and code for you. I'll get it to you as soon as possible.

Hello CoDETR team!

Im also interested in the instance seg results, would you send me the mask head config and the weights please?

Thanks a lot

Please email zongzhuofan@gmail.com to obtain the seg model.

Cosmo1210 commented 5 months ago

Great thanks! And last question, for LVIS Sota you sent me the config file and the checkpoint, but I dont have access to the mask head, only the box head, would you have it please?

Sure, I will provide you with the original model. However, I'm currently engaged in several other projects, so it might take a bit of time to organize the original model and code for you. I'll get it to you as soon as possible.

Hello CoDETR team! Im also interested in the instance seg results, would you send me the mask head config and the weights please? Thanks a lot

Please email zongzhuofan@gmail.com to obtain the seg model.

I have send an email, looking forward to your reply :)