Nixtla / neuralforecast

Scalable and user friendly neural :brain: forecasting algorithms.
https://nixtlaverse.nixtla.io/neuralforecast
Apache License 2.0
2.98k stars 342 forks source link

[FEAT] Add option to support user defined learning rate scheduler for NeuralForecast Models #998

Closed JQGoh closed 4 months ago

JQGoh commented 4 months ago

Rationale

review-notebook-app[bot] commented 4 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

JQGoh commented 4 months ago

@jmoralez Please review. Hope this might be useful to the community too.

elephaint commented 4 months ago

@JQGoh Thanks for your work, the PR looks good to me. @cchallu @jmoralez I'm happy to add this as feature to neuralforecast, wdyt?

JQGoh commented 4 months ago

@jmoralez @cchallu If you have some time, appreciate your review on this. As I notice recently there has been new models added to neuralforecast, would love to get this merged (subject to approval/reviewed) and later the other newly implemented models shall consider these arguments

jmoralez commented 4 months ago

Can you please add warnings if the user provides lr_scheduler_kwargs but doesn't provide lr_scheduler (that the kwargs will be ignored)? We got a similar request for the optimizer by the sktime folks.

JQGoh commented 4 months ago

Can you please add warnings if the user provides lr_scheduler_kwargs but doesn't provide lr_scheduler (that the kwargs will be ignored)? We got a similar request for the optimizer by the sktime folks.

@jmoralez Sounds good to me. I shall add warnings for

If you have the link/reference to the ask by sktime folks, can mention in this PR too. Thanks

jmoralez commented 4 months ago

https://github.com/sktime/sktime/pull/6235#issue-2216009123

BrunoBelucci commented 4 months ago

I was facing the same issue when I wanted to use a custom scheduler and you have a solution that is pretty much what I have, the only thing that I am missing in it is that I think we should also be able to change the other arguments of "lr_scheduler_config" like the frequency or the interval (https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#configure-optimizers for a list of other parameters). My solution was in fact to add 2 new kwargs, lr_scheduler_kwargs as you have and lr_scheduler_config as expected by lightning, only adding the actual scheduler to it in configure_optimizers with lr_scheduler_config['scheduler'] = self.lr_scheduler(optimizer=optimizer, **lr_scheduler_kwargs). I think that if someone is bothering to manually change the lr_scheduler it makes sense to give them full control over it.

JQGoh commented 4 months ago

I was facing the same issue when I wanted to use a custom scheduler and you have a solution that is pretty much what I have, the only thing that I am missing in it is that I think we should also be able to change the other arguments of "lr_scheduler_config" like the frequency or the interval (https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#configure-optimizers for a list of other parameters). My solution was in fact to add 2 new kwargs, lr_scheduler_kwargs as you have and lr_scheduler_config as expected by lightning, only adding the actual scheduler to it in configure_optimizers with lr_scheduler_config['scheduler'] = self.lr_scheduler(optimizer=optimizer, **lr_scheduler_kwargs). I think that if someone is bothering to manually change the lr_scheduler it makes sense to give them full control over it.

@BrunoBelucci Thanks for your suggestion. I lean toward allowing the users to have more freedom in customizing the behaviour, including frequency.

On top of the prepared PR, think I may need an additional arg called lr_scheduler_config that represents the set of arguments as detailed in https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#configure-optimizers, but it shall ignore the "scheduler" (since we will specify it separately, plus we can ensure that lr_scheduler using the same optimizer). If Nixtla team agrees that lr_scheduler_config is a nice to have feature, I can prepare this in the next PR. cc: @jmoralez

jmoralez commented 4 months ago

I'm not really a fan of adding so many arguments to all models. Would it be possible to instead provide a function that overrides the configure_optimizers method? Either by calling it or monkey patching.

BrunoBelucci commented 4 months ago

Honestly, I think that in this case, the cleanest solution is to add one more argument. We already have arguments for the trainer, optimizer, and scheduler. Introducing a different approach for configuring the scheduler, which is part of the same system, could confuse users and require them to write additional boilerplate code to achieve their goals. Here are some scenarios to further develop my point:

  1. Using the CosineAnnealingLR Scheduler The default scheduler configuration is fine because the scheduler is not affected by it.

  2. Using the OneCycleLR Scheduler The default scheduler configuration is fine if the user knows that the default is {'frequency': 1, 'interval': 'step'}. They would only need to calculate the number of steps to pass as total_steps in the lr_scheduler_kwargs. However, if they do not know this, they would need to check the code (since it is not documented) to ensure the model trains as intended.

  3. Using the ReduceLROnPlateau Scheduler In this case, the user needs to change the scheduler configuration because, typically, we want to identify a plateau by epoch rather than by step. Additionally, the user has to specify the "monitor" metric they want. Without the ability to pass the scheduler configuration directly, users would need to use one approach to pass the scheduler and its kwargs and another approach (such as calling another function or monkey patching) to overwrite the scheduler configuration.

I see no reason for users to have to handle each scenario above differently. Additionally, this approach would allow us to document the default behavior clearly.

jmoralez commented 4 months ago

What I mean is that we currently have two arguments (for the optimizer), this adds two more (for the scheduler) and you're proposing adding a fifth, when all of these are going to be used in the same method (configure_optimizers) and they all could've been just one where the user passes a callable that takes the model parameters and returns what pytorch lightning expects from the configure_optimizers method.

Given that sktime is already using the optimizer arguments it'd be a bad experience to deprecate them and introduce a new one that takes a function, so I think we should move forward with adding more to not deprecate something we recently introduced, I just wish we'd done the single argument approach from the start.

JQGoh commented 4 months ago

What I mean is that we currently have two arguments (for the optimizer), this adds two more (for the scheduler) and you're proposing adding a fifth, when all of these are going to be used in the same method (configure_optimizers) and they all could've been just one where the user passes a callable that takes the model parameters and returns what pytorch lightning expects from the configure_optimizers method.

Given that sktime is already using the optimizer arguments it'd be a bad experience to deprecate them and introduce a new one that takes a function, so I think we should move forward with adding more to not deprecate something we recently introduced, I just wish we'd done the single argument approach from the start.

@jmoralez I also wished that I could have implemented it differently, did not consider about the option of modifying the default configure_optimizes behavior.

By the way, shall we re-consider this implementation and revert this PR?

I implemented the option of modifying configure_optimizers behavior via set_configure_optimizers at BaseModel level, please check the work in https://github.com/Nixtla/neuralforecast/pull/1015

MLfreakPy commented 2 months ago

Thank you so much for the implementation of a custom lr scheduler!

@BrunoBelucci : Could you explain how it is possible to implement a ReduceLROnPlateau scheduler? I think some of your earlier discussion evolved around it, however, I couldn't make it work yet. Currently I am passing through the lr_scheduler and lr_scheduler_kwargs parameters. However, I get the below error when implementing ReduceLROnPlateau. Other schedulers, like OneCycle, work perfectly though!

MisconfigurationException('The lr scheduler dict must include a monitor when a ReduceLROnPlateau scheduler is used. For example: {"optimizer": optimizer, "lr_scheduler": {"scheduler": scheduler, "monitor": "your_loss"}}').

JQGoh commented 2 months ago

Thank you so much for the implementation of a custom lr scheduler!

@BrunoBelucci : Could you explain how it is possible to implement a ReduceLROnPlateau scheduler? I think some of your earlier discussion evolved around it, however, I couldn't make it work yet. Currently I am passing through the lr_scheduler and lr_scheduler_kwargs parameters. However, I get the below error when implementing ReduceLROnPlateau. Other schedulers, like OneCycle, work perfectly though!

MisconfigurationException('The lr scheduler dict must include a monitor when a ReduceLROnPlateau scheduler is used. For example: {"optimizer": optimizer, "lr_scheduler": {"scheduler": scheduler, "monitor": "your_loss"}}').

@MLfreakPy Could you open a new issue and continue the discussions related to this topic? You could link this PR as a reference. As this PR has already addressed the original purpose, it will be more beneficial for the public users to follow the discussed topic. Thanks for the interest in this and when I have more time, will definitely check about the use of ReduceLROnPlateau!