ashkamath / mdetr

Apache License 2.0
969 stars 125 forks source link

Why the model size on PhraseCut is small than others? #17

Closed zhenwwang closed 3 years ago

zhenwwang commented 3 years ago

It really confuses me that:

zhenwwang commented 3 years ago

More over, there are some mismatch between the .md and the paper for phtrasecut.

Should I follow the paper or the .md?

Very thanks.

alcinos commented 3 years ago

Hello @Zavier-Wang

Thank you for your interest in MDETR.

The size is smaller because our checkpoints also contain the optimizer state. In the case of PhraseCut, because of the fine-tuning, the optimizer state contains only the segmentation head, while for the other pre-trained models, it contains the full transformer + backbone + text_transformer, which, in total, is bigger.

Number of epochs: Thanks for pointing out the discrepancy, we will update the paper accordingly. The correct numbers are given in the readme.

EMA: you shouldn't use EMA for the segmentation head (although I think the difference is really minimal). I'll update the readme.

I believe I have answered your questions and as such I'm closing this. Feel free to reach out if you have further concerns.