Atten4Vis / ConditionalDETR

This repository is an official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence". (https://arxiv.org/abs/2108.06152)
Apache License 2.0
358 stars 48 forks source link

Can you provide the checkpoints of ConditionalDETR-R50 and -R101 trained with 108 epochs ? #15

Closed truetone2022 closed 2 years ago

truetone2022 commented 2 years ago

I can't replicate the result of ConditionalDETR-R50 and -R101 trained with 108 epochs The replicated result of ConditionalDETR-R50 trained with 108 epochs

image

The replicated result of ConditionalDETR-R101 trained with 108 epochs

image
DeppMeng commented 2 years ago

can you provide your full training configs? e.g., batch-size, number of GPU used? So that we can help you locate the problem.

truetone2022 commented 2 years ago

The only difference is that i use 32 A100 gpus to train the model .

truetone2022 commented 2 years ago

if possible, can you provide the checkpoint of these two models trained with 108 epochs? Very thanks!

truetone2022 commented 2 years ago

if not, can you provide the reproducible training configs of these two models trained with 108 epochs? Very thanks!

DeppMeng commented 2 years ago
  1. We do not have the 108 epoch model for the released code for now, but we will work on reproducing it. We will provide it to you when we got the model.
  2. About the training log of 108 epoch: the only differences between 108 epoch config and 50 epoch config are: --epochs 108 and --lr_drop 80. It means to set totoal number of epochs to 108 and drop the learning rate at 80th epoch.
  3. Some thought about your results: We use total batch-size 16 or 8 to train our models. You mentioned that you use 32 A100s, I guess the batch-size you use is much larger than 16. From my observation, larger batch-size causes lower performance. I tried 64 batch-size and the performance drop is nonnegligible. I have never tried larger batch-size. So two possible solutions: (a) lower your batch-size to 16. (b) you might need to raise and find the proper the initial lr (and add lr warm-up) if you insist to use larger batch-size. We do not have any experience about it, generally for AdamW optimizer, the scaling rule for lr - bs is not linear.
truetone2022 commented 2 years ago

Thanks for your helpful advice! You are so kind!