OlafenwaMoses / ImageAI

A python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities
https://www.genxr.co/#products
MIT License
8.58k stars 2.19k forks source link

loss showing NAN #495

Open bkazour opened 4 years ago

bkazour commented 4 years ago

Hello, So i was training a custom object detection model and the model actually went fine for 15 epochs. I stopped, and did some inference tests, and resumed from that model. However, as you can see, the loss below turned into nan suddenly in the 4th epoch. Anyone experienced a similar issue before and have any suggestion i can use ?

trainer = DetectionModelTrainer() trainer.setModelTypeAsYOLOv3() trainer.setDataDirectory(data_directory= folderName) trainer.setTrainConfig(object_names_array=["label", "badge"], batch_size=8, num_experiments=200, train_from_pretrained_model=os.path.join(execution_path, "inference/model15.h5")) trainer.trainModel()

Training Image 9432/9432 [==============================] - 6030s 639ms/step - loss: 3.3162 - yolo_layer_1_loss: 0.4524 - yolo_layer_2_loss: 1.0795 - yolo_layer_3_loss: 1.7843 - val_loss: 4.4249 - val_yolo_layer_1_loss: 0.7384 - val_yolo_layer_2_loss: 1.3280 - val_yolo_layer_3_loss: 2.3586 Epoch 2/200 9432/9432 [==============================] - 5999s 636ms/step - loss: 3.2442 - yolo_layer_1_loss: 0.4121 - yolo_layer_2_loss: 1.0521 - yolo_layer_3_loss: 1.7800 - val_loss: 4.1125 - val_yolo_layer_1_loss: 0.6567 - val_yolo_layer_2_loss: 1.3438 - val_yolo_layer_3_loss: 2.1121 Epoch 3/200 9432/9432 [==============================] - 5947s 630ms/step - loss: 3.2971 - yolo_layer_1_loss: 0.4377 - yolo_layer_2_loss: 1.0666 - yolo_layer_3_loss: 1.7928 - val_loss: 4.1430 - val_yolo_layer_1_loss: 0.8216 - val_yolo_layer_2_loss: 1.2789 - val_yolo_layer_3_loss: 2.0424 Epoch 4/200 1545/9432 [===>..........................] - ETA: 1:15:46 - loss: nan - yolo_layer_1_loss: nan - yolo_layer_2_loss: nan - yolo_layer_3_loss: nan

saksham1211 commented 4 years ago

I am also facing this same issue. Have you found any answers related to this?

saksham1211 commented 4 years ago

I am also facing this same issue. Have you found any answers related to this?

Kindly check the path of your model, which you are using in the program. It solved the issue for me.