Both loss and val_loss are nan during YOLO training

david8862 / keras-YOLOv3-model-set

end-to-end YOLOv4/v3/v2 object detection pipeline, implemented on tf.keras with different technologies

MIT License

640 stars 222 forks source link

Both loss and val_loss are nan during YOLO training #173

Open Goes2021 opened 3 years ago

Goes2021 commented 3 years ago

Hi,

I am trying to train a ZeroCost YOLO model with my own dataset, but all I get back is 0.0000 and for the loss and val_loss: nan. I also tried the exact same dataset in the ZeroCost Detectron model and there it worked properly.

Seen labels: {'type_1': 594} Given labels: ['type_1'] Overlap labels: {'type_1'} (13, 13) Epoch 1/30

9s - loss: nan - val_loss: nan

Epoch 00001: val_loss did not improve from inf

type_1 0.0000 mAP: 0.0000 mAP did not improve from 0. Epoch 2/30

4s - loss: nan - val_loss: nan

Epoch 00002: val_loss did not improve from inf

type_1 0.0000 mAP: 0.0000 mAP did not improve from 0.

Does anyone know the problem and the solution? Thanks!

david8862 commented 3 years ago

what is your training config?

Goes2021 commented 3 years ago

What exactly do you mean? I generated my images and xml files in ImageJ and just followed all the steps in the ZeroCost YOLO notebook.

david8862 commented 3 years ago

What exactly do you mean? I generated my images and xml files in ImageJ and just followed all the steps in the ZeroCost YOLO notebook.

I mean what --model_type, --anchors_path and other options you used when running train.py of this repo, just like mentioned in README:

# python train.py --model_type=yolo3_mobilenet_lite --anchors_path=configs/yolo3_anchors.txt --annotation_file=trainval.txt --classes_path=configs/voc_classes.txt --eval_online --save_eval_checkpoint

Goes2021 commented 3 years ago

I am sorry i didn't make that clear before, but i am using YOLOv2. I just run the cell path to training images with the right pathes selected and after the cell with start training.