AnswerDotAI / bert24

Apache License 2.0
25 stars 3 forks source link

Add CosineInverseSqrtScheduler #66

Closed warner-benjamin closed 1 week ago

warner-benjamin commented 2 weeks ago

This PR adds a flexible implementation of the multi-stage infinite scheduler from Stable LM 2 1.6B Technical Report. The default of t_cosine="0.25dur" should match the StableLM2 schedule.

This version allows optional linear/cosine warmup, any length cosine decay into inverse square root decay, followed by optional linear/cosine cooldown.

warner-benjamin commented 1 week ago

Instead of defining def _linear_schedule and def _cosine_schedule in scheduler.py, you could likely leverage the already existing LinearScheduler and CosineAnnealingScheduler in composer.

LinearScheduler doesn't allow you to pass in a starting time, it requires the state and assumes the schedule starts from 0. Which is why WarmupStableDecayScheduler implements custom linear decay too.

Since I defined _linear_schedule, I went ahead and renamed _cosine_schedule to match so they'd be swappable for cosine/linear decay/warmup.