After 10 epoch error - Githubissues

TD-wzw commented 3 years ago

Loading and preparing results... Traceback (most recent call last): File "tools/train.py", line 92, in main(args) File "tools/train.py", line 87, in main trainer.run(train_dataloader, val_dataloader, evaluator) File "/disk_2t_02/wangzhiwei/nanodet/nanodet/trainer/trainer.py", line 143, in run eval_results = evaluator.evaluate(results, self.cfg.save_dir, epoch, self.logger, rank=self.rank) File "/disk_2t_02/wangzhiwei/nanodet/nanodet/evaluator/coco_detection.py", line 55, in evaluate coco_dets = self.coco_api.loadRes(json_path) File "/home/lhw/anaconda3/envs/pytorch1.6/lib/python3.8/site-packages/pycocotools/coco.py", line 328, in loadRes if 'caption' in anns[0]: IndexError: list index out of range

RangiLyu commented 3 years ago

It seems that the results.json file is empty. Can you upload your results.json for me to find out what's going wrong?

TD-wzw commented 3 years ago

results-1.txt

TD-wzw commented 3 years ago

Why is it empty

RangiLyu commented 3 years ago

It's weird. Maybe because the model detected nothing in the Val dataset.

TD-wzw commented 3 years ago

9/5000 Thank you. Let me check again

ztt0810 commented 3 years ago

I also encountered the same problem, did you solve it?

TD-wzw commented 3 years ago

Take a look at your config file

ztt0810 commented 3 years ago

#Config File example
save_dir: workspace/nanodet_m
model:
  arch:
    name: GFL
    backbone:
      name: ShuffleNetV2
      model_size: 1.0x
      out_stages: [2,3,4]
      activation: LeakyReLU
    fpn:
      name: PAN
      in_channels: [116, 232, 464]
      out_channels: 96
      start_level: 0
      num_outs: 3
    head:
      name: NanoDetHead
      num_classes: 80
      input_channel: 96
      feat_channels: 96
      stacked_convs: 2
      share_cls_reg: True
      octave_base_scale: 5
      scales_per_octave: 1
      strides: [8, 16, 32]
      reg_max: 7
      norm_cfg:
        type: BN
      loss:
        loss_qfl:
          name: QualityFocalLoss
          use_sigmoid: True
          beta: 2.0
          loss_weight: 1.0
        loss_dfl:
          name: DistributionFocalLoss
          loss_weight: 0.25
        loss_bbox:
          name: GIoULoss
          loss_weight: 2.0
data:
  train:
    name: coco
    img_path: ./dataset/train/img
    ann_path: ./dataset/coco_annotations/train_annotations.json
    input_size: [320,320] #[w,h]
    keep_ratio: True
    pipeline:
      perspective: 0.0
      scale: [0.6, 1.4]
      stretch: [[1, 1], [1, 1]]
      rotation: 0
      shear: 0
      translate: 0.2
      flip: 0.5
      brightness: 0.2
      contrast: [0.8, 1.2]
      saturation: [0.8, 1.2]
      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
  val:
    name: coco
    img_path: ./dataset/val/img
    ann_path: ./dataset/coco_annotations/val_annotations.json
    input_size: [320,320] #[w,h]
    keep_ratio: True
    pipeline:
      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
device:
  gpu_ids: [0]
  workers_per_gpu: 12
  batchsize_per_gpu: 1
schedule:
  resume:
  load_model: ./trained_models/model_last_100.pth
  optimizer:
    name: SGD
    lr: 0.14
    momentum: 0.9
    weight_decay: 0.0001
  warmup:
    name: linear
    steps: 300
    ratio: 0.1
  total_epochs: 160
  lr_schedule:
    name: MultiStepLR
    milestones: [130,160,150,155]
    gamma: 0.1
  val_intervals: 10
evaluator:
  name: CocoDetectionEvaluator
  save_key: mAP

log:
  interval: 10

#class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
#              'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant',
#              'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog',
#              'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
#              'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
#              'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat',
#              'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket',
#              'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
#              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
#              'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
#              'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop',
#              'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
#              'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
#              'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush']

class_names: ['face', 'mask', 'face_mask']

TD-wzw commented 3 years ago

Check that your val path is correct and that you can read the data

TD-wzw commented 3 years ago

I made a little mistake with Val before

ztt0810 commented 3 years ago

I am sure that every picture in the val has been loaded when I am doing validating, but a new problem has appeared at this time And when I run test.py with the cmd python tools/test.py --config config/nanodet-m.yml --task test , the result.json file is still empty

TD-wzw commented 3 years ago

This problem has not yet been encountered

ztt0810 commented 3 years ago

Okay，thanks a lot

TD-wzw commented 3 years ago

Are you training on a single GPU

ztt0810 commented 3 years ago

yes，i use colab

TD-wzw commented 3 years ago

Everything is all right

TD-wzw commented 3 years ago

The problem may still be in the validation set

ztt0810 commented 3 years ago

Okay, thank you. I will check it again.

TD-wzw commented 3 years ago

My pleasure

dada1437903138 commented 3 years ago

The problem may still be in the validation set

I met the same problem and checked my val_data configuration, there seems nothing wrong. How did you fix it bro?

dada1437903138 commented 3 years ago

Okay, thank you. I will check it again.

Have you fixed this problem?

ztt0810 commented 3 years ago

Maybe you can check the 'num_classes' in your config file, it should be the same as the number of categories in your dataset

carry-xz commented 3 years ago

Check whether the number of num_classes and class_names is equal.
Check if the validation set path is correct and if there are annotation errors
If the training data is small, several more epochs need to be trained to see the detection results; otherwise, the results are empty

ankandrew commented 3 years ago

If you get this or zero loss while training, double check the consistency of *_annotations.json. For example, If you download and export open-images-v6 dataset with fiftyone make sure to:

Add iscrowd attribute to each detection
Delete unused classes in *_annotations.json > categories field.

duwangthefirst commented 3 years ago

I may have fixed it. in my config file, CocoDetectionEvaluator is used:

evaluator:
  name: CocoDetectionEvaluator
  save_key: mAP

if your dataset is not good enough or your class_num is too big, then maybe your model won't converge even after 10 epoch.

here is my struggle to solve the problem:

case 1: when I train the model with my custom dataset: a train set of size 32 and a val set of size 32, the same exception is throwed out: (I'm using small dataset to test if the env is properly configured)

if 'caption' in anns[0]:
IndexError: list index out of range

case 2: however, after I change the dataset to: a train set of size 20000 and a val set of size 1000, the exception just gone away.

both case 1 and case 2 are with batch_size=32 and trained for 1 epoch and then validated for 1 epoch. and the case 2 just work fine.

so the conclusion is: don't use the build-in validation process untill you are sure that your model actually learns something.(maybe after 20 or more epochs (I just don't have the patience))

if you just want to avoid the exception, just delete this in you config file:

evaluator:
  name: CocoDetectionEvaluator
  save_key: mAP

Ratansairohith commented 3 years ago

Hey. I am also facing similar issues while evaluating the trained model. It is throwing the error

TypeError: Object of type Tensor is not JSON serializable

Can anyone help me to solve this error? Thanks in Advance.

RangiLyu / nanodet

After 10 epoch error #120