About the pretrained model and training time

amazon-science / polygon-transformer

Apache License 2.0

131 stars 9 forks source link

About the pretrained model and training time #1

Closed byminji closed 1 year ago

byminji commented 1 year ago

Hi, thanks for releasing a code for your great work!

Could you please share the pretrained PolyFormer weights? Also, how long did it take to pretrain and finetune the model in your environment?

Thanks :)

huidingai commented 1 year ago

Hi @byminji,

Thanks for your interest in our work! We just released the model weights (both pre-trained and fine-tuned). For pretrianing, with 8 A-100 GPUs, the pretraining of PolyFormerL takes a week, and fine-tuning takes 2-3 days.

byminji commented 1 year ago

@huidingai Thanks for sharing the model weights! I have one more question about the fine-tuning process. Based on the run script for fine-tuning, it seems that you use the combined training set for refcoco/+/g, but there are different model weights for each dataset. What's the difference between each model? (e.g., polyformer_b_refcoco.pt vs polyformer_b_refcoco+.pt)

huidingai commented 1 year ago

@byminji We used the validation set of refcoco/+/g to select the best model.

byminji commented 1 year ago

@huidingai I see. Thank you for your answers :)

lmsdss commented 1 year ago

Hello,I would like to express my sincere appreciation for your hard work and contributions to the project. It has been incredibly valuable and informative. If I don't have the same number of GPUs, does that mean I can't directly use the pre-trained models you provided? It seems that this would result in a mismatch between the loaded pre-trained model and the model I want to use.

joellliu commented 1 year ago

Hello,I would like to express my sincere appreciation for your hard work and contributions to the project. It has been incredibly valuable and informative. If I don't have the same number of GPUs, does that mean I can't directly use the pre-trained models you provided? It seems that this would result in a mismatch between the loaded pre-trained model and the model I want to use.

Hi, you should still be able to use the pretrained models but you will need to modify the training/ evaluation scripts to match the number of gpus you have.