Noble-Lab / casanovo

De Novo Mass Spectrometry Peptide Sequencing with a Transformer Model
https://casanovo.readthedocs.io
Apache License 2.0
102 stars 37 forks source link

Add option to change learning rate scheduler #249

Closed justin-a-sanders closed 7 months ago

justin-a-sanders commented 1 year ago

Added config option to choose between the current cosine learning rate schedule, a linearly decreasing schedule, or a no schedule (a constant learning rate). Also made it easier to add any new learning rate schedule options in the future by separating out the linear warmup into a separate schedule.

bittremieux commented 1 year ago

Can you properly format the files using black (I see single quotes) so that that test passes? The unit test failure is weird, some infinite loop when installing dependencies. 🤔

Can you also add some unit tests to make sure the learning rate schedule is set correctly based on the config.

codecov[bot] commented 1 year ago

Codecov Report

Merging #249 (7358564) into dev (86630e3) will increase coverage by 0.14%. Report is 3 commits behind head on dev. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev     #249      +/-   ##
==========================================
+ Coverage   89.26%   89.41%   +0.14%     
==========================================
  Files          12       12              
  Lines         904      907       +3     
==========================================
+ Hits          807      811       +4     
+ Misses         97       96       -1     
Files Coverage Δ
casanovo/config.py 92.68% <ø> (ø)
casanovo/denovo/model.py 97.85% <100.00%> (+0.38%) :arrow_up:
casanovo/denovo/model_runner.py 87.50% <ø> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

melihyilmaz commented 1 year ago

When I locally tested the new branch and track learning rate with print(self.lr_schedulers().state_dict()), I saw that in cosine and linear options decay schedules did override the warm up period, i.e. no warm up, and linear warm up wasn't even linear. constant option works as intended.

melihyilmaz commented 7 months ago

I created this Colab notebook to experiment with the implementation in this PR by plotting the learning rate over training steps from a toy example.

Cosine scheduler implementation in this PR doesn't have a warm up despite being chained with linear warm up (compare the with main branch version) and I couldn't figure out why yet. @justin-a-sanders it'd be great if you can take a look at the notebook.

melihyilmaz commented 7 months ago

While tinkering with the notebook, I got the warm up to work with using our cosine scheduler using torch.optim.lr_scheduler.SequentialLR(optimizer, schedulers = lr_schedulers, milestones = [warmup_iters]) instead of torch.optim.lr_scheduler.ChainedScheduler(lr_schedulers) for the final_scheduler but the periodicity seems to be off now (see the last cell in the notebook).

justin-a-sanders commented 7 months ago

While tinkering with the notebook, I got the warm up to work with using our cosine scheduler using torch.optim.lr_scheduler.SequentialLR(optimizer, schedulers = lr_schedulers, milestones = [warmup_iters]) instead of torch.optim.lr_scheduler.ChainedScheduler(lr_schedulers) for the final_scheduler but the periodicity seems to be off now (see the last cell in the notebook).

I took a quick look at this - to be used in a chained learning rate schedule, the LR needs to be 'chainable', meaning it is a function which depends on only the current learning rate and the epoch (and not some initial value). Unless I'm missing something, I don't think a cosine lr schedule is easily compatible with that paradigm. I put a hacky example of how you can make it work in the notebook by dividing by the previous factor each epoch, but this either breaks after one cycle (div by 0) or requires you to slightly shift the cosine so it is never exactly zero. I'm not sure the flexibility of easily swapping in different schedules is worth the hack if you have already found warmup+cosine gives best results.

bittremieux commented 7 months ago

Superseded by #294.