Training is interrupted by itself

fizyr / keras-retinanet

Keras implementation of RetinaNet object detection.

Apache License 2.0

4.38k stars 1.96k forks source link

Training is interrupted by itself #1545

Closed popocry closed 2 years ago

popocry commented 3 years ago

I am using my own training set for training. I set 100 epoch but it stops by itself at 7 epoch, and no errors are generated. I train for many times and it will be interrupted by about 7-8epoch.

This is the code i run: python bin/train.py --weights snapshots/resnet50_coco_best_v2.1.0.h5 --compute-val-loss --weighted-average --image-min-side 480 --image-max-side 800 --batch-size 5 --steps 1172 --epochs 100 csv CSV/train.csv CSV/class.csv --val-annotations CSV/val.csv

nikhilcms commented 3 years ago

I am also using my own training dataset , set 50 epoch , but it stopped at 7 itself without any warning/error. please let us know how can we fix this issue.

ewaty commented 3 years ago

Same for me, it's not 7 (16 I think), but it trains for a bit and then crashes without any error. It's not a problem when I turn off validation

nikhilcms commented 3 years ago

actually it's depend on training/validation loss of each epoch, to solve this issue: in train.py change patience parameter to number of epoch you have set (I think default is 6 or 7). hope this helps you.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to the lack of recent activity. It will be closed if no further activity occurs. Thank you for your contributions.