I just checked the configuration file, it seems that some of the training strategies are quite different from the original DeiT training recipe, (e.g. Batch size, learning rate scheduler, model ema ...) So I'm wondering what would be the baseline result for this configuration?
I just checked the configuration file, it seems that some of the training strategies are quite different from the original DeiT training recipe, (e.g. Batch size, learning rate scheduler, model ema ...) So I'm wondering what would be the baseline result for this configuration?