RangiLyu / nanodet

NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥
Apache License 2.0
5.78k stars 1.04k forks source link

Error Training NanoDet-Plus with COCO Subset, PyTorch Lightning #545

Open samuel-wj-chapman opened 11 months ago

samuel-wj-chapman commented 11 months ago

Issue Description

I am encountering an error when training nanodet-plus-m-1.5x_416 with a subset of the COCO dataset. The process fails during the training stage, raising a MisconfigurationException related to the CosineAnnealingLR scheduler.

Environment Details

Package Cython is not installed.
Package matplotlib is installed with version 3.6.2.
Package numpy is installed with version 1.22.2.
Package omegaconf is installed with version 2.3.0.
Package onnx is installed with version 1.13.0.
Package onnx-simplifier is installed with version 0.4.35.
Package opencv-python is installed with version 4.8.1.78.
Package pyaml is installed with version 23.9.7.
Package pycocotools is installed with version 2.0+nv0.7.1.
Package pytorch-lightning is installed with version 1.9.5.
Package tabulate is installed with version 0.9.0.
Package tensorboard is installed with version 2.9.0.
Package termcolor is installed with version 2.4.0.
Package torch is installed with version 1.14.0a0+44dac51.
Package torchmetrics is installed with version 1.2.1.
Package torchvision is installed with version 0.15.0a0.
Package tqdm is installed with version 4.64.1.

Steps to Reproduce

python train.py /path/to/yaml

Error

root@0c6cc8c2df08:/nanodet# python tools/train.py dataset/nanodet-plus-m-1.5x_416.yml 
NOTE! Installing ujson may make loading annotations faster.
[NanoDet][12-14 13:06:22]INFO:Setting up data...
Loading annotations into memory...
Done (t=1.07s)
Creating index...
index created!
Loading annotations into memory...
Done (t=0.04s)
Creating index...
index created!
[NanoDet][12-14 13:06:23]INFO:Creating model...
model size is  1.5x
init weights...
=> loading pretrained model https://download.pytorch.org/models/shufflenetv2_x1_5-3c479a10.pth
Finish initialize NanoDet-Plus Head.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Traceback (most recent call last):
  File "tools/train.py", line 155, in <module>
    main(args)
  File "tools/train.py", line 150, in main
    trainer.fit(task, train_dataloader, val_dataloader, ckpt_path=model_resume_path)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1093, in _run
    self.strategy.setup(self)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/single_device.py", line 74, in setup
    super().setup(trainer)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/strategy.py", line 154, in setup
    self.setup_optimizers(trainer)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/strategy.py", line 142, in setup_optimizers
    self.optimizers, self.lr_scheduler_configs, self.optimizer_frequencies = _init_optimizers_and_lr_schedulers(
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/optimizer.py", line 195, in _init_optimizers_and_lr_schedulers
    _validate_scheduler_api(lr_scheduler_configs, model)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/optimizer.py", line 354, in _validate_scheduler_api
    raise MisconfigurationException(
lightning_fabric.utilities.exceptions.MisconfigurationException: The provided lr scheduler `CosineAnnealingLR` doesn't follow PyTorch's LRScheduler API. You should override the `LightningModule.lr_scheduler_step` hook with your own logic if you are using a custom LR scheduler.

Yaml training file

save_dir: workspace/nanodet-plus-m-1.5x_416
model:
  weight_averager:
    name: ExpMovingAverager
    decay: 0.9998
  arch:
    name: NanoDetPlus
    detach_epoch: 10
    backbone:
      name: ShuffleNetV2
      model_size: 1.5x
      out_stages: [2,3,4]
      activation: LeakyReLU
    fpn:
      name: GhostPAN
      in_channels: [176, 352, 704]
      out_channels: 128
      kernel_size: 5
      num_extra_level: 1
      use_depthwise: True
      activation: LeakyReLU
    head:
      name: NanoDetPlusHead
      num_classes: 4
      input_channel: 128
      feat_channels: 128
      stacked_convs: 2
      kernel_size: 5
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
      norm_cfg:
        type: BN
      loss:
        loss_qfl:
          name: QualityFocalLoss
          use_sigmoid: True
          beta: 2.0
          loss_weight: 1.0
        loss_dfl:
          name: DistributionFocalLoss
          loss_weight: 0.25
        loss_bbox:
          name: GIoULoss
          loss_weight: 2.0
    # Auxiliary head, only use in training time.
    aux_head:
      name: SimpleConvHead
      num_classes: 4
      input_channel: 256
      feat_channels: 256
      stacked_convs: 4
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 7
data:
  train:
    name: CocoDataset
    img_path: dataset/train2017
    ann_path: dataset/annotations/vehicls_instances_train2017.json
    input_size: [416,416] #[w,h]
    keep_ratio: False
    pipeline:
      perspective: 0.0
      scale: [0.6, 1.4]
      stretch: [[0.8, 1.2], [0.8, 1.2]]
      rotation: 0
      shear: 0
      translate: 0.2
      flip: 0.5
      brightness: 0.2
      contrast: [0.6, 1.4]
      saturation: [0.5, 1.2]
      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
  val:
    name: CocoDataset
    img_path: dataset/val2017
    ann_path: dataset/annotations/vehicls_instances_val2017.json
    input_size: [416,416] #[w,h]
    keep_ratio: False
    pipeline:
      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
device:
  gpu_ids: [0]
  workers_per_gpu: 10
  batchsize_per_gpu: 96
  precision: 32 # set to 16 to use AMP training
schedule:
#  resume:
#  load_model:
  optimizer:
    name: AdamW
    lr: 0.001
    weight_decay: 0.05
  warmup:
    name: linear
    steps: 500
    ratio: 0.0001
  total_epochs: 300
  lr_schedule:
    name: CosineAnnealingLR
    T_max: 300
    eta_min: 0.00005
  val_intervals: 10
grad_clip: 35
evaluator:
  name: CocoDetectionEvaluator
  save_key: mAP
log:
  interval: 50

class_names: ['car', 'motorcycle','bus','truck',]