HaoZhongkai / GNOT

26 stars 14 forks source link

About scaling experiments #2

Closed BraveDrXuTF closed 11 months ago

BraveDrXuTF commented 11 months ago

Thank you for your excellent work! I want to ask the training setting of the scaling experiment in the paper. Should we keep same strategy for different training data size? I mean, if you use cycling strategy for each iter, the learning rate for each step would not be the same even if the cycling strategy is same for all experiments (Because the data size is different.).

HaoZhongkai commented 11 months ago

Hello! For scaling experiments, most parameters should be kept fixed and others should have a clear relationship with your dataset size. Fo cyclic learning rate scheduler, it has two parameters, steps_per_epoch and epochs to automatically control the learning rate. So if the number of epochs is kept the same, the lr is the same at every given epoch for different data sizes. The difference is we train more batches using each learning rate in the cycle.

BraveDrXuTF commented 11 months ago

Got it, thanks.