I am encountering an error when training nanodet-plus-m-1.5x_416 with a subset of the COCO dataset. The process fails during the training stage, raising a MisconfigurationException related to the CosineAnnealingLR scheduler.
Environment Details
Package Cython is not installed.
Package matplotlib is installed with version 3.6.2.
Package numpy is installed with version 1.22.2.
Package omegaconf is installed with version 2.3.0.
Package onnx is installed with version 1.13.0.
Package onnx-simplifier is installed with version 0.4.35.
Package opencv-python is installed with version 4.8.1.78.
Package pyaml is installed with version 23.9.7.
Package pycocotools is installed with version 2.0+nv0.7.1.
Package pytorch-lightning is installed with version 1.9.5.
Package tabulate is installed with version 0.9.0.
Package tensorboard is installed with version 2.9.0.
Package termcolor is installed with version 2.4.0.
Package torch is installed with version 1.14.0a0+44dac51.
Package torchmetrics is installed with version 1.2.1.
Package torchvision is installed with version 0.15.0a0.
Package tqdm is installed with version 4.64.1.
Steps to Reproduce
python train.py /path/to/yaml
Error
root@0c6cc8c2df08:/nanodet# python tools/train.py dataset/nanodet-plus-m-1.5x_416.yml
NOTE! Installing ujson may make loading annotations faster.
[NanoDet][12-14 13:06:22]INFO:Setting up data...
Loading annotations into memory...
Done (t=1.07s)
Creating index...
index created!
Loading annotations into memory...
Done (t=0.04s)
Creating index...
index created!
[NanoDet][12-14 13:06:23]INFO:Creating model...
model size is 1.5x
init weights...
=> loading pretrained model https://download.pytorch.org/models/shufflenetv2_x1_5-3c479a10.pth
Finish initialize NanoDet-Plus Head.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Traceback (most recent call last):
File "tools/train.py", line 155, in <module>
main(args)
File "tools/train.py", line 150, in main
trainer.fit(task, train_dataloader, val_dataloader, ckpt_path=model_resume_path)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
call._call_and_handle_interrupt(
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1093, in _run
self.strategy.setup(self)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/single_device.py", line 74, in setup
super().setup(trainer)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/strategy.py", line 154, in setup
self.setup_optimizers(trainer)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/strategy.py", line 142, in setup_optimizers
self.optimizers, self.lr_scheduler_configs, self.optimizer_frequencies = _init_optimizers_and_lr_schedulers(
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/optimizer.py", line 195, in _init_optimizers_and_lr_schedulers
_validate_scheduler_api(lr_scheduler_configs, model)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/optimizer.py", line 354, in _validate_scheduler_api
raise MisconfigurationException(
lightning_fabric.utilities.exceptions.MisconfigurationException: The provided lr scheduler `CosineAnnealingLR` doesn't follow PyTorch's LRScheduler API. You should override the `LightningModule.lr_scheduler_step` hook with your own logic if you are using a custom LR scheduler.
Issue Description
I am encountering an error when training
nanodet-plus-m-1.5x_416
with a subset of the COCO dataset. The process fails during the training stage, raising aMisconfigurationException
related to theCosineAnnealingLR
scheduler.Environment Details
Steps to Reproduce
python train.py /path/to/yaml
Error
Yaml training file