google-deepmind / optax

Optax is a gradient processing and optimization library for JAX.
https://optax.readthedocs.io
Apache License 2.0
1.63k stars 174 forks source link

Inconsistencies in schedules API #835

Open fabianp opened 6 months ago

fabianp commented 6 months ago

In the documentation:

  1. In the API reference https://optax.readthedocs.io/en/latest/api/optimizer_schedules.html there's a section "Schedules with warm-up". I would consider optax.cosine_onecycle_schedule to have warm-up, yet it's not in this section. My recommendation would be to remove the section ""Schedules with warm-up" and put optax.warmup_cosine_decay_schedule in the Cosine decay schedule section and optax.warmup_exponential_decay_schedule in the exponential decay section
Abhinavcode13 commented 1 month ago

I'm ready to pick this up! @fabianp

fabianp commented 1 month ago

excellent @Abhinavcode13 ! which task would you like to pick up? I would suggest to pick one of the remaining 3 and start with that. Smaller PR are better for everyone :-)

Abhinavcode13 commented 1 month ago

Sure

Abhinavcode13 commented 1 month ago
  • [ ] For most schedules, the total number of steps is specified through the transition_steps parameter, but in some cases (e.g., optax.cosine_decay_schedule, optax.warmup_cosine_decay_schedule but confusingly not optax.cosine_onecycle_schedule) it's called decay_steps instead.
  • [ ] The name sgdr_schedule is not descriptive of what the schedule actually does.
  • [ ] Most warm-up learning rates like linear_onecycle_schedule and cosine_onecycle_schedule specify the length of the warm-up phrase using parameter pct_start , but warmup_cosine_decay_schedule instead specifies it through a parameter warmup_steps

In the documentation: 5. In the API reference https://optax.readthedocs.io/en/latest/api/optimizer_schedules.html there's a section "Schedules with warm-up". I would consider optax.cosine_onecycle_schedule to have warm-up, yet it's not in this section. My recommendation would be to remove the section ""Schedules with warm-up" and put optax.warmup_cosine_decay_schedule in the Cosine decay schedule section and optax.warmup_exponential_decay_schedule in the exponential decay section

FYI: I would look up the second one first.