Pretraining hyperparameter conflict between code and paper

google-research / big_transfer

Official repository for the "Big Transfer (BiT): General Visual Representation Learning" paper.

https://arxiv.org/abs/1912.11370

Apache License 2.0

1.5k stars 175 forks source link

Pretraining hyperparameter conflict between code and paper #79

Closed sgunasekar closed 1 year ago

sgunasekar commented 1 year ago

In the paper, the pretraining hyperparameters are specified as 90 epochs with 5k warmup steps. However, the uses 500 warmup steps and about 65 epochs on Imagenet1M -- the latter appear more reasonable warmup:total_epoch ration at 4096 batchsize. It is possible that these hyperparameters are only for finetuning and not pretraining. In any case, could you clarify which hyperparameters were used for the checkpoints during pretraining? TIA