facebookresearch / deit

Official DeiT repository
Apache License 2.0
4.02k stars 552 forks source link

Are the hyperparameters for DeiT-T and for DeiT-S any different than DeiT-B? #201

Closed Phuoc-Hoan-Le closed 1 year ago

Phuoc-Hoan-Le commented 1 year ago

Are the hyperparameters for DeiT-T and for DeiT-S any different than DeiT-B? In DeiT 3 (https://arxiv.org/pdf/2204.07118.pdf), you mentioned you use slightly different hyperparameters for "small" and "tiny" models but in DeiT 1 (https://arxiv.org/pdf/2012.12877.pdf) you didn't mention anything about the difference of hyperparameters between DeiT-T, DeiT-S, and DeiT-B?

So to get the same result as in DeiT 1 (https://arxiv.org/pdf/2012.12877.pdf), are the hyperparameters for DeiT-T, DeiT-S, and DeiT-B the same?

TouvronHugo commented 1 year ago

Hi @CharlesLeeeee, Yes, see the DeiT-III readme for the detailed hyperparameters. We adapt the lr and stochastic depth drop rate.

For DeiT 1 we have exactly the same hyperparameters for the different architecture but it's not optimal.

Best regards,

Hugo