RangiLyu / nanodet

NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥
Apache License 2.0
5.68k stars 1.03k forks source link

During the entire training the validation MAP remains 0 #491

Open mylifeasazucchini opened 1 year ago

mylifeasazucchini commented 1 year ago

Hi! I am a bit new to Object Detector Models and I have been trying to train a single class object detector using the custom xml config file but the mAP remains 0 entirely

At epoch 0: INFO:Train|Epoch1/300|Iter0(1/5)| mem:4.18G| lr:1.00e-07| loss_qfl:0.0729| loss_bbox:0│· .0000| loss_dfl:0.0000| aux_loss_qfl:0.0658| aux_loss_bbox:0.0000| aux_loss_dfl:0.0000| │·

At epoch 300: INFO:NanoDet:Val|Epoch300/300|Iter1500(1/2)| mem:3.72G| lr:5.00e-05| loss_qfl:0.0001| loss_bbox:0.0000| loss_df│· l:0.0000| aux_loss_qfl:0.0000| aux_loss_bbox:0.0000| aux_loss_dfl:0.0000|
│· [NanoDet][01-26 09:42:45]INFO:Val_metrics: {'mAP': 0, 'AP_50': 0, 'AP_75': 0, 'AP_small': 0, 'AP_m': 0, 'AP_l':│· 0}

The main question is why does my bbox loss remain 0 throughout ?

This is my xml config file (in case it helps):

model:
  weight_averager:
    name: ExpMovingAverager
    decay: 0.9998
  arch:
    name: NanoDetPlus
    detach_epoch: 10
    backbone:
      name: ShuffleNetV2
      model_size: 1.0x
      out_stages: [2,3,4]
      activation: LeakyReLU
    fpn:
      name: GhostPAN
      in_channels: [116, 232, 464]
      out_channels: 96
      kernel_size: 5
      num_extra_level: 1
      use_depthwise: True
      activation: LeakyReLU
    head:
      name: NanoDetPlusHead
      num_classes: 1
      input_channel: 96
      feat_channels: 96
      stacked_convs: 2
      kernel_size: 5
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 1
      norm_cfg:
        type: BN
      loss:
        loss_qfl:
          name: QualityFocalLoss
          use_sigmoid: True
          beta: 2.0
          loss_weight: 1.0
        loss_dfl:
          name: DistributionFocalLoss
          loss_weight: 0.25
        loss_bbox:
          name: GIoULoss
          loss_weight: 2.0
    # Auxiliary head, only use in training time.
    aux_head:
      name: SimpleConvHead
      num_classes: 1
      input_channel: 192
      feat_channels: 192
      stacked_convs: 4
      strides: [8, 16, 32, 64]
      activation: LeakyReLU
      reg_max: 1

class_names: &class_names ['Whiteboard']  #Please fill in the category names (not include background category)
data:
  train:
    name: XMLDataset
    class_names: *class_names
    img_path: wboard_data/train  #Please fill in train image path
    ann_path: wboard_data/train  #Please fill in train xml path
    input_size: [320,320] #[w,h]
    keep_ratio: True
    pipeline:
      perspective: 0.0
      scale: [0.6, 1.4]
      stretch: [[1, 1], [1, 1]]
      rotation: 0
      shear: 0
      translate: 0.2
      flip: 0.5
      brightness: 0.2
      contrast: [0.8, 1.2]
      saturation: [0.8, 1.2]
      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
  val:
    name: XMLDataset
    class_names: *class_names
    img_path: wboard_data/valid #Please fill in val image path
    ann_path: wboard_data/valid #Please fill in val xml path
    input_size: [320,320] #[w,h]
    keep_ratio: True
    pipeline:
      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
device:
  gpu_ids: [0] # Set like [0, 1, 2, 3] if you have multi-GPUs
  workers_per_gpu: 8
  batchsize_per_gpu: 32
schedule:
#  resume:
#  load_model: YOUR_MODEL_PATH
  optimizer:
    name: AdamW
    lr: 0.001
    weight_decay: 0.05
  warmup:
    name: linear
    steps: 500
    ratio: 0.0001
  total_epochs: 300
  lr_schedule:
    name: CosineAnnealingLR
    T_max: 300
    eta_min: 0.00005
  val_intervals: 10
grad_clip: 35
evaluator:
  name: CocoDetectionEvaluator
  save_key: mAP

log:
  interval: 10

Would appreciate any help! Thxxx

cansik commented 1 year ago

I experienced the same problem with my custom yolo dataset reader (#487) which is based on the xml reader you are using. It seems that there is a bug in the xml_dataset.py#L119 which leads to labels without a category. When adding the category, the id is set to index + 1. For the category of the bounding box only the cat_id is used (without +1).

So I would suggest to fix this by adding a +1 to the category id:

ann = {
    "image_id": idx + 1,
    "bbox": coco_box,
    "category_id": cat_id + 1,
    "iscrowd": 0,
    "id": ann_id,
    "area": coco_box[2] * coco_box[3],
}
mylifeasazucchini commented 1 year ago

Hi @cansik

Thanks for taking the time to look in and reply to my issue :) but unfortunately trying the aforementioned changes didn't do it for me unfortunately and the training logs and the validation metrics are still the same [ie 0 throughout]

cansik commented 1 year ago

I am not sure how VOC XML numbers the categories, but Yolo starts at 0 and I guess MS COCO starts at 1, that's why my problem was fixed with it. It could be that there is something else going on.

I would suggest you debug the loading of the labels. Since the xml loader is based on COCO, I would add a breakpoint here, step through and see if the labels are correctly loaded: https://github.com/RangiLyu/nanodet/blob/main/nanodet/data/dataset/coco.py#L69

phonver commented 1 year ago

I am not sure how VOC XML numbers the categories, but Yolo starts at 0 and I guess MS COCO starts at 1, that's why my problem was fixed with it. It could be that there is something else going on.我不确定VOC XML是如何对类别进行编号的,但是Yolo从 0 开始,我猜MS COCO从 1 开始,这就是为什么我的问题被修复了。

I would suggest you debug the loading of the labels. Since the xml loader is based on COCO, I would add a breakpoint here, step through and see if the labels are correctly loaded: https://github.com/RangiLyu/nanodet/blob/main/nanodet/data/dataset/coco.py#L69我建议你调试标签的加载。由于xml加载器是基于COCO的,我将在这里添加一个断点,单步执行并查看标签是否正确加载https://github.com/RangiLyu/nanodet/blob/main/nanodet/data/dataset/coco.py#L69

我从voc转coco格式,跑出来map只有两个类别有 其他的都是0请问是格式的问题吗?