Open bkazour opened 4 years ago
I am also facing this same issue. Have you found any answers related to this?
I am also facing this same issue. Have you found any answers related to this?
Kindly check the path of your model, which you are using in the program. It solved the issue for me.
Hello, So i was training a custom object detection model and the model actually went fine for 15 epochs. I stopped, and did some inference tests, and resumed from that model. However, as you can see, the loss below turned into nan suddenly in the 4th epoch. Anyone experienced a similar issue before and have any suggestion i can use ?
trainer = DetectionModelTrainer() trainer.setModelTypeAsYOLOv3() trainer.setDataDirectory(data_directory= folderName) trainer.setTrainConfig(object_names_array=["label", "badge"], batch_size=8, num_experiments=200, train_from_pretrained_model=os.path.join(execution_path, "inference/model15.h5")) trainer.trainModel()
Training Image
9432/9432 [==============================] - 6030s 639ms/step - loss: 3.3162 - yolo_layer_1_loss: 0.4524 - yolo_layer_2_loss: 1.0795 - yolo_layer_3_loss: 1.7843 - val_loss: 4.4249 - val_yolo_layer_1_loss: 0.7384 - val_yolo_layer_2_loss: 1.3280 - val_yolo_layer_3_loss: 2.3586 Epoch 2/200 9432/9432 [==============================] - 5999s 636ms/step - loss: 3.2442 - yolo_layer_1_loss: 0.4121 - yolo_layer_2_loss: 1.0521 - yolo_layer_3_loss: 1.7800 - val_loss: 4.1125 - val_yolo_layer_1_loss: 0.6567 - val_yolo_layer_2_loss: 1.3438 - val_yolo_layer_3_loss: 2.1121 Epoch 3/200 9432/9432 [==============================] - 5947s 630ms/step - loss: 3.2971 - yolo_layer_1_loss: 0.4377 - yolo_layer_2_loss: 1.0666 - yolo_layer_3_loss: 1.7928 - val_loss: 4.1430 - val_yolo_layer_1_loss: 0.8216 - val_yolo_layer_2_loss: 1.2789 - val_yolo_layer_3_loss: 2.0424 Epoch 4/200 1545/9432 [===>..........................] - ETA: 1:15:46 - loss: nan - yolo_layer_1_loss: nan - yolo_layer_2_loss: nan - yolo_layer_3_loss: nan