Questions about hyper-parameters in the finetuning stage

falcon-xu commented 2 years ago

Hi, it's a wonderful work. I am trying to replicate the results of the paper that have been fine-tuned to datasets. As you know, it often needs to take a long time to adjust hyper-parameters to get well trained transformer-based models. If you can show all the settings, it will be really helpful.

I saw some settings in #105, #45, but it lacks other size of models and datasets such as DeiT-Ti, iNat.

Could you please help summarize all size of model on different finetuning datasets mentioned in the paper?

Maybe a table form is very clear, just like the following:

model type	pretrained dataset	finetuned dataset	lr	bs	wd	sched	epochs	warmup	...
ViT-B
ViT-L
...
DeiT-Ti
DeiT-S
...

I really appreciate it.

TouvronHugo commented 2 years ago

Hi @lostsword, Thank you for your suggestion. As soon as I have some time I complete this table and add it to the Readme. I'll keep you informed. Best, Hugo

falcon-xu commented 2 years ago

Hi @lostsword, Thank you for your suggestion. As soon as I have some time I complete this table and add it to the Readme. I'll keep you informed. Best, Hugo

OK. Thanks a lot.

HashmatShadab commented 2 years ago

Hi!

This would help a lot! @TouvronHugo any update on when it would be possible?

facebookresearch / deit

Questions about hyper-parameters in the finetuning stage #152