Closed warner-benjamin closed 1 week ago
Instead of defining
def _linear_schedule
anddef _cosine_schedule
in scheduler.py, you could likely leverage the already existing LinearScheduler and CosineAnnealingScheduler in composer.
LinearScheduler
doesn't allow you to pass in a starting time, it requires the state and assumes the schedule starts from 0. Which is why WarmupStableDecayScheduler
implements custom linear decay too.
Since I defined _linear_schedule
, I went ahead and renamed _cosine_schedule
to match so they'd be swappable for cosine/linear decay/warmup.
This PR adds a flexible implementation of the multi-stage infinite scheduler from Stable LM 2 1.6B Technical Report. The default of
t_cosine="0.25dur"
should match the StableLM2 schedule.This version allows optional linear/cosine warmup, any length cosine decay into inverse square root decay, followed by optional linear/cosine cooldown.