Closed Ma-Weijian closed 1 year ago
Also, pytorch lightning 1.5 and later reports a bug that it cannot judge the batch size in self.log
of class segmentation
.
The error log is like
Traceback (most recent call last):
File "/root/UV-Net/segmentation.py", line 117, in <module>
trainer.fit(model, train_loader, val_loader)
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
self._call_and_handle_interrupt(
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 724, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1237, in _run
results = self._run_stage()
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1324, in _run_stage
return self._run_train()
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1346, in _run_train
self._run_sanity_check()
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1414, in _run_sanity_check
val_loop.run()
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 153, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 127, in advance
output = self._evaluation_step(**kwargs)
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 222, in _evaluation_step
output = self.trainer._call_strategy_hook("validation_step", *kwargs.values())
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1766, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 344, in validation_step
return self.model.validation_step(*args, **kwargs)
File "/root/UV-Net/uvnet/models.py", line 326, in validation_step
self.log("val_loss", loss, on_step=False, on_epoch=True)
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 417, in log
results.log(
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py", line 493, in log
batch_size = self._extract_batch_size(self[key], batch_size, meta)
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py", line 428, in _extract_batch_size
batch_size = extract_batch_size(self.batch)
File "/usr/local/miniconda3/envs/uvnet/lib/python3.9/site-packages/pytorch_lightning/utilities/data.py", line 81, in extract_batch_size
raise MisconfigurationException(error_msg)
pytorch_lightning.utilities.exceptions.MisconfigurationException: We could not infer the batch_size from the batch. Either simplify its structure or provide the batch_size as `self.log(..., batch_size=batch_size)`.
Maybe it is necessary to change the code to fit the newer APIs.
Thanks for reporting this. I think an easier and more stable fix would be for us to include the exact version number for PyTorch Lightning and torchmetrics in the environment.yml
file. That way we can guarantee that the code works without issues when API changes in future. I will fix this.
Hi, I added version numbers for PyTorch Lightning and torchmetrics in the environment yml file. I was able to create a new environment from scratch and run the code without any issues. Can you check if my fixes in the package_versions
branch work for you?
Hi Autodesk AI Lab.
Anaconda chooses torchmetrics 0.9.0 for me. And as the title shows, such api no longer exists in future versions of torchmetrics.
I believe there are newer apis in torchmetrics to replace the old one and I wonder what it is.
Any idea helps.
Thanks and best regards.
Anderson Ma