divamgupta / image-segmentation-keras

Implementation of Segnet, FCN, UNet , PSPNet and other models in Keras.
https://divamgupta.com/image-segmentation/2019/06/06/deep-learning-semantic-segmentation-keras.html
MIT License
2.92k stars 1.16k forks source link

Issues with Saving Trained Model #366

Closed varungupta31 closed 2 years ago

varungupta31 commented 2 years ago

I'm using TF 2.3.0(as I don't have a CUDA 11 GPU), and training the fcn_32_vgg model, using the script:

import tensorflow
from tensorflow.keras import utils
from keras_segmentation.models.fcn import fcn_32_vgg

model = fcn_32_vgg(n_classes=2, input_height=256, input_width=512)

model.train(train_images = "/home2/varungupta/image-segmentation-keras/datats_sub/train/images/",train_annotations = 
 /home2/varungupta/image-segmentation-keras/datats_sub/train/annotations/",checkpoints_path = "logs/", epochs=3)

During training, I get the following logs:

Epoch 1/3
2022-04-14 23:45:31.284196: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2022-04-14 23:45:32.014281: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
 2/32 [>.............................] - ETA: 3s - loss: 0.0179 - accuracy: 0.5396WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0722s vs `on_train_batch_end` time: 0.1597s). Check your callbacks.
32/32 [==============================] - ETA: 0s - loss: 0.0193 - accuracy: 0.3511
Epoch 00001: saving model to logs/.00001
32/32 [==============================] - 24s 762ms/step - loss: 0.0193 - accuracy: 0.3511
Epoch 2/3
32/32 [==============================] - ETA: 0s - loss: 0.0099 - accuracy: 0.0387
Epoch 00002: saving model to logs/.00002
32/32 [==============================] - 23s 727ms/step - loss: 0.0099 - accuracy: 0.0387
Epoch 3/3
32/32 [==============================] - ETA: 0s - loss: 6.9894e-04 - accuracy: 0.0275
Epoch 00003: saving model to logs/.00003
32/32 [==============================] - 23s 730ms/step - loss: 6.9894e-04 - accuracy: 0.0275

However, my checkpoint_path contains only two files: checkpoint and _config.json. The checkpoint contains

model_checkpoint_path: ".00003"
all_model_checkpoint_paths: ".00003"

while the json contains:

{"model_class": "fcn_32_vgg", "n_classes": 2, "input_height": 256, "input_width": 512, "output_height": 288, "output_width": 544}

Where is my saved model / How can I save my trained model for further inferences and GradCam?

I tried changing /image-segmentation-keras/keras_segmentation/train.py line 56 to

self.model.save(self.checkpoints_path+'Model_'+str(epoch))

But it didn't solve anything. Kindly help me out in saving the model.

@divamgupta

LaBiXiaoChai commented 2 years ago

self.model.save(self.checkpoints_path+'Model_'+str(epoch)) This line is not used to save model The author uses ModelCheckpoint() to save Model,which in line 184 to /image-segmentation-keras/keras_segmentation/train.py I set the checkpoint parameter as --checkpoints_path="checkpoint", It can save the checkpoint file in the root directory, maybe you can try.

divamgupta commented 2 years ago

You might need to make a concrete tensor-flow function and save that. Else currently this codebase only saves the weights and to use the model create the model object and call load_weights.