asteroid-team / asteroid

The PyTorch-based audio source separation toolkit for researchers
https://asteroid-team.github.io/
MIT License
2.21k stars 419 forks source link

X-UMX: lr_scheduler_step() missing 1 required positional argument: 'metric' #664

Closed DavidDiazGuerra closed 10 months ago

DavidDiazGuerra commented 1 year ago

Hello,

I'm trying to use the receipt for training the X-UMX model on the musdb18 dataset but the following exception raises after finishing the validation of the first epoch:

Traceback (most recent call last):
  File "/opt/tuni/worktmp/repos/asteroid/egs/musdb18/X-UMX/train.py", line 498, in <module>
    main(arg_dic, plain_args)
  File "/opt/tuni/worktmp/repos/asteroid/egs/musdb18/X-UMX/train.py", line 465, in main
    trainer.fit(system)
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 520, in fit
    call._call_and_handle_interrupt(
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 42, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 92, in launch
    return function(*args, **kwargs)
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 559, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 935, in _run
    results = self._run_stage()
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 978, in _run_stage
    self.fit_loop.run()
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 202, in run
    self.on_advance_end()
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 377, in on_advance_end
    self.epoch_loop.update_lr_schedulers("epoch", update_plateau_schedulers=not self.restarting)
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 300, in update_lr_schedulers
    self._update_learning_rates(interval=interval, update_plateau_schedulers=update_plateau_schedulers)
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 350, in _update_learning_rates
    call._call_lightning_module_hook(
  File "/worktmp/anaconda3/envs/py39torch20/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 142, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
TypeError: lr_scheduler_step() missing 1 required positional argument: 'metric'

I am using Pytorch 2.0.0 and Lightning 2.0.2. In order to make the model work with Pytorch 2.0.0 I've made some minor changes in the X-UMX model (see #662) and I've reduced the batch size to make the model fit in my local GPU, but I don't think I've made any changes that should affect the Lightning training.

Maybe this could be related to be using recent versions of Pytorch and Lightning? I have no experience working with Lightning.

Best regards, David

DavidDiazGuerra commented 1 year ago

Hi again,

The problem disappeared after downgrading Lightning to 1.9.4. I guess there are some changes in the 2.0 version that breaks the X-UMX training routine of Asteroid.

Best, David

LeonieBorne commented 1 year ago

Hello 👋

I had the same problem, it seems that the function definition is def lr_scheduler_step(self, scheduler, metric) in version 2.0 instead of def lr_scheduler_step(self, scheduler, optimizer_idx, metric) in previous versions (see the custom learning rate schedulers documentation here).

@mpariente Should we change this and update the requirements to pytorch-lightning>=2.0.0?

mpariente commented 1 year ago

I'm not sure this code has an easy backward compatible fix. So we should

DavidDiazGuerra commented 10 months ago

I'm closing this issue since it has been fixed with #682 and the release 0.7.0.