Why the model size on PhraseCut is small than others?

zhenwwang commented 3 years ago

It really confuses me that:

According to the paper "In the first step, we take our pre-trained model after 40 epochs and fine-tune it for 5 epochs on this dataset, supervising the model to output correct boxes for the referred expressions"
Since the referring expression segmentation model also uses the pre-trained model, why is it only 1.2GB smaller than the pre-trained model which is 2.4GB?

zhenwwang commented 3 years ago

More over, there are some mismatch between the .md and the paper for phtrasecut.

epochs: in the paper, you fine-tune the model for 5 epochs and trian the segmentation head for 15 epochs on this dataset, but they were 10 and 35 in the .md python run_with_submitit.py --dataset_config configs/phrasecut.json --epochs 10 --lr_drop 11 --load https://zenodo.org/record/4721981/files/pretrained_resnet101_checkpoint.pth --ema --ngpus 8 --nodes 4 --text_encoder_lr 1e-5 --lr 5e-5
ema: in the paper, you said you used no EMA to train the segmentation head, but EMA was used in the .md python run_with_submitit.py --dataset_config configs/phrasecut.json --ema --ngpus 8 --nodes 4 --frozen_weights detection_r101/checkpoint.pth --mask_model smallconv --no_aux_loss --epochs 35 --lr_drop 25

Should I follow the paper or the .md?

Very thanks.

alcinos commented 3 years ago

Hello @Zavier-Wang

Thank you for your interest in MDETR.

The size is smaller because our checkpoints also contain the optimizer state. In the case of PhraseCut, because of the fine-tuning, the optimizer state contains only the segmentation head, while for the other pre-trained models, it contains the full transformer + backbone + text_transformer, which, in total, is bigger.

Number of epochs: Thanks for pointing out the discrepancy, we will update the paper accordingly. The correct numbers are given in the readme.

EMA: you shouldn't use EMA for the segmentation head (although I think the difference is really minimal). I'll update the readme.

I believe I have answered your questions and as such I'm closing this. Feel free to reach out if you have further concerns.

ashkamath / mdetr

Why the model size on PhraseCut is small than others? #17