QData / spacetimeformer

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."
https://arxiv.org/abs/2109.12218
MIT License
806 stars 192 forks source link

Error while resuming training from saved checkpoint #83

Open DeepakLabh opened 1 year ago

DeepakLabh commented 1 year ago

Passing ckpt_path in lightening's .fit() method gives the below error for the line trainer.fit(forecaster, datamodule=data_module, ckpt_path='best.ckpt.ckpt'). The intent is to resume training from saved checkpoints.

Restoring states from the checkpoint path at best.ckpt.ckpt

================================================================== | Name | Type | Params

0 | spacetimeformer | Spacetimeformer | 4.5 M

4.5 M Trainable params 0 Non-trainable params 4.5 M Total params 18.191 Total estimated model params size (MB) Restored all states from the checkpoint file at best.ckpt.ckpt Epoch 0: 75%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 105/140 [00:00<?, ?it/s]Traceback (most recent call last): File "train_vol.py", line 457, in trainer.fit(forecaster, datamodule=data_module, ckpt_path='best.ckpt.ckpt') File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit self._call_and_handle_interrupt( File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, kwargs) File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch return function(*args, *kwargs) File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1237, in _run results = self._run_stage() File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1324, in _run_stage return self._run_train() File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1354, in _run_train self.fit_loop.run() File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 205, in run self.on_advance_end() File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 297, in on_advance_end self.trainer._call_callback_hooks("on_train_epoch_end") File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1637, in _call_callback_hooks fn(self, self.lightning_module, args, kwargs) File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/callbacks/early_stopping.py", line 179, in on_train_epoch_end self._run_early_stopping_check(trainer) File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/callbacks/early_stopping.py", line 190, in _run_early_stopping_check if trainer.fast_dev_run or not self._validate_condition_metric( # disable early_stopping with fast_dev_run File "/home/deepak.l/venv_spacetimeformer_13_sep/lib/python3.8/site-packages/pytorch_lightning/callbacks/early_stopping.py", line 145, in _validate_condition_metric raise RuntimeError(error_msg) RuntimeError: Early stopping conditioned on metric val/loss which is not available. Pass in or modify your EarlyStopping callback to use any of the following: ``