Will you provide the training config file for ViT-L (66.0 AP)?

Sense-X / Co-DETR

[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training

MIT License

1.02k stars 113 forks source link

Will you provide the training config file for ViT-L (66.0 AP)? #28

Open zhangchbin opened 1 year ago

zhangchbin commented 1 year ago

Thanks for your help.

HITerStudy commented 1 year ago

The number of encoder layer number and decoder layer in Co-DINO for 66.0AP is 6? these is no any description about this part in the paper.

TempleX98 commented 1 year ago

Hi, @zhangchbin, we have no plan to do this now. But we have updated the arxiv paper to release more details about this large model.

TempleX98 commented 1 year ago

@HITerStudy We find the performance saturates when using more than 6 encoder or decoder layers for larger models (e.g., Swin-L). So we use 6 layers by default.

HITerStudy commented 1 year ago

@HITerStudy We find the performance saturates when using more than 6 encoder or decoder layers for larger models (e.g., Swin-L). So we use 6 layers by default.

Thanks for your reply!

zhangchbin commented 1 year ago

@TempleX98 Hi, I encountered the error Co-DETR/mmdet/datasets/builder.py", line 80, in build_dataset dataset = MultiImageMixDataset(**cp_cfg) TypeError: __init__() got an unexpected keyword argument 'filter_empty_gt', when I use config projects/configs/co_dino/co_dino_5scale_lsj_swin_large_3x_coco.py.

TempleX98 commented 1 year ago

@zhangchbin, I have fixed it

zhangchbin commented 1 year ago

Amazing, it will spend 12 days training co_dino_5scale_lsj_swin_large_3x_coco.py with 8 GPUs. It is so strange because the whole dataset has double size samples (14786 vs 7000+). mmdet - INFO - Epoch [1][50/14786] lr: 2.000e-05, eta: 12 days, 17:17:46

TempleX98 commented 1 year ago

The eta time is inaccurate in the beginning. You can use DETR aug config if you want to accelerate training as it's faster than LSJ aug. Besides, you better use 16 GPUs (1 image per GPU) for Co-DINO w/ SwinL training.

HITerStudy commented 1 year ago

how to implement the TTA used for the ViT-L(66.0AP), could you describe some details? Thank you.

RicoJYang commented 1 year ago

Hi, @zhangchbin, we have no plan to do this now. But we have updated the arxiv paper to release more details about this large model.

请问ViT-L (66.0 AP)的模型的backbone是用的eva02给出的eva02_L_pt_m38m_p14to16 | 304M | Merged-38M | 56这个预训练模型嘛。还有就是8卡40G A100可以完成在Objects365和coco的训练嘛，万分感谢！

zimenglan-sysu-512 commented 1 year ago

Hi, @zhangchbin, we have no plan to do this now. But we have updated the arxiv paper to release more details about this large model.

请问ViT-L (66.0 AP)的模型的backbone是用的eva02给出的eva02_L_pt_m38m_p14to16 | 304M | Merged-38M | 56这个预训练模型嘛。还有就是8卡40G A100可以完成在Objects365和coco的训练嘛，万分感谢！

hi V100能跑吗