Closed BraveDrXuTF closed 11 months ago
Hello! For scaling experiments, most parameters should be kept fixed and others should have a clear relationship with your dataset size. Fo cyclic learning rate scheduler, it has two parameters, steps_per_epoch and epochs to automatically control the learning rate. So if the number of epochs is kept the same, the lr is the same at every given epoch for different data sizes. The difference is we train more batches using each learning rate in the cycle.
Got it, thanks.
Thank you for your excellent work! I want to ask the training setting of the scaling experiment in the paper. Should we keep same strategy for different training data size? I mean, if you use cycling strategy for each iter, the learning rate for each step would not be the same even if the cycling strategy is same for all experiments (Because the data size is different.).