Closed Phuoc-Hoan-Le closed 1 year ago
Hi @CharlesLeeeee, Yes, see the DeiT-III readme for the detailed hyperparameters. We adapt the lr and stochastic depth drop rate.
For DeiT 1 we have exactly the same hyperparameters for the different architecture but it's not optimal.
Best regards,
Hugo
Are the hyperparameters for DeiT-T and for DeiT-S any different than DeiT-B? In DeiT 3 (https://arxiv.org/pdf/2204.07118.pdf), you mentioned you use slightly different hyperparameters for "small" and "tiny" models but in DeiT 1 (https://arxiv.org/pdf/2012.12877.pdf) you didn't mention anything about the difference of hyperparameters between DeiT-T, DeiT-S, and DeiT-B?
So to get the same result as in DeiT 1 (https://arxiv.org/pdf/2012.12877.pdf), are the hyperparameters for DeiT-T, DeiT-S, and DeiT-B the same?