Changes "warmup steps" parameter from int to float. For input values >1 it behaves the same way as it did previously, warmup steps / gradient steps. For values between 0-1, it changes to warmup steps * total steps (total steps already compensates for gradient accumulation). Any other values will be set to zero. Also modified mouseover text to reflect that change.
Some examples of behavior, tested with total steps 100. Also verified warmup behavior correctly works for different gradient accumulation values, though I did come across an issue related to changing that value while training is running (see issue #557):
Warmup set to 50 = 50 steps of warmup, 50 steps of scheduler
Warmup set to 0.4 = 40 steps of warmup, 60 steps of scheduler
Warmup set to -0.3 = 0 steps of warmup, 100 steps of scheduler
Warmup set to -20 = 0 steps of warmup, 100 steps of scheduler
Warmup set to 1 = 100 steps of warmup, 0 steps of scheduler
Changes "warmup steps" parameter from int to float. For input values >1 it behaves the same way as it did previously, warmup steps / gradient steps. For values between 0-1, it changes to warmup steps * total steps (total steps already compensates for gradient accumulation). Any other values will be set to zero. Also modified mouseover text to reflect that change.
Some examples of behavior, tested with total steps 100. Also verified warmup behavior correctly works for different gradient accumulation values, though I did come across an issue related to changing that value while training is running (see issue #557):