JihongJu / keras-fcn

A playable implementation of Fully Convolutional Networks with Keras.
MIT License
202 stars 85 forks source link

Trying to Train VGG16 Model for localizing Text from natural images. Used Dataset MSRA-TD500 #20

Open nikstar802 opened 7 years ago

nikstar802 commented 7 years ago

Hi, First of all, I want to say this library is awesome.

Actually, I am trying to localize Text from natural images. I am trying to train a single Image from MSRA-TD500 Dataset using VGG16 network given by you. But unfortunately, the model is NOT converging as per the expectations.

For understanding, I just want to train my network on single image and test on the same image. But that itself is NOT Happening.

I am using 'ADAM' Optimizer and 'Categorical Crossentroy' as functions and using 2 Classes to categorize text and non-text areas.

This is how it is getting trained. For pre-processing, I am subtracting mean pixels from original image and then dividing the image by standard deviation.

1/1 [==============================] - 64s - loss: 0.7233 - acc: 0.4443 Epoch 2/10 1/1 [==============================] - 51s - loss: 3.2022 - acc: 0.8014 Epoch 3/10 1/1 [==============================] - 52s - loss: 3.2022 - acc: 0.8014 Epoch 4/10 1/1 [==============================] - 52s - loss: 3.2022 - acc: 0.8014 Epoch 5/10 1/1 [==============================] - 52s - loss: 3.2022 - acc: 0.8014 Epoch 6/10 1/1 [==============================] - 51s - loss: 3.2022 - acc: 0.8014 Epoch 7/10 1/1 [==============================] - 52s - loss: 3.2022 - acc: 0.8014 Epoch 8/10 1/1 [==============================] - 51s - loss: 3.2022 - acc: 0.8014 Epoch 9/10 1/1 [==============================] - 51s - loss: 3.2022 - acc: 0.8014 Epoch 10/10 1/1 [==============================] - 51s - loss: 3.2021 - acc: 0.8014

Can you suggest something on this issue... Thanks ...

JihongJu commented 7 years ago

Hi @nikstar802 , I appreciate you like it. Given training log, it seems the loss explodes after a few updates. Did you observe a sudden loss explosion or a gradual loss growth?

nikstar802 commented 7 years ago

Hi, Thanks For the reply.

Actually, the loss is exploding suddenly. From Second Epoch itself, loss increases and then becomes constant after 3rd Epoch. I tried with the simplest image as possible. Created my own image with a blue color uniform background and put few text items on the fore ground with big font sizes.

But unable to understand why the network is NOT training properly. Is it something related to weight initialization ... I am keeping weights as 'None' before starting the training. Or may be something related to Loss function. I tried with SGD, RMSPROP, ADAM. But nothing seems to work out.

JihongJu commented 7 years ago

@nikstar802 Please first make sure you are working with the newest master branch because I forgot to include "softmax" activation previously which causes sudden weights/loss explosion.

Other than that, it can also be possible that:

  1. You are using a too large learning rate.
  2. Your dataset is unbalanced so that it figures out to predict zeros everywhere.
nikstar802 commented 7 years ago

Hi, Thanks for reply. I was using latest master branch only. Actually, I changed the activation from Softmax to Sigmoid to check to try the model. I changed that back to Softmax and here are the results now.

Epoch 00000: val_loss improved from inf to 6.38728, saving model to /tmp/fcn_vgg16_weights.h5 1/1 [==============================] - 98s - loss: 0.8112 - acc: 0.4567 - val_loss: 6.3873 - val_acc: 0.6039 Epoch 2/100 Epoch 00001: val_loss did not improve 1/1 [==============================] - 66s - loss: 6.3873 - acc: 0.6039 - val_loss: 6.3874 - val_acc: 0.6039 Epoch 3/100 Epoch 00002: val_loss did not improve 1/1 [==============================] - 67s - loss: 6.3874 - acc: 0.6039 - val_loss: 6.3875 - val_acc: 0.6039 Epoch 4/100 Epoch 00003: val_loss did not improve 1/1 [==============================] - 66s - loss: 6.3875 - acc: 0.6039 - val_loss: 6.3876 - val_acc: 0.6039 Epoch 5/100 Epoch 00004: val_loss did not improve 1/1 [==============================] - 74s - loss: 6.3876 - acc: 0.6039 - val_loss: 6.3878 - val_acc: 0.6039 Epoch 6/100 Epoch 00005: val_loss did not improve 1/1 [==============================] - 71s - loss: 6.3878 - acc: 0.6039 - val_loss: 6.3879 - val_acc: 0.6039 Epoch 7/100 Epoch 00006: val_loss did not improve 1/1 [==============================] - 74s - loss: 6.3879 - acc: 0.6039 - val_loss: 6.3880 - val_acc: 0.6039 Epoch 8/100 Epoch 00007: val_loss did not improve 1/1 [==============================] - 72s - loss: 6.3880 - acc: 0.6039 - val_loss: 6.3881 - val_acc: 0.6039 Epoch 9/100 Epoch 00008: val_loss did not improve 1/1 [==============================] - 71s - loss: 6.3881 - acc: 0.6039 - val_loss: 6.3883 - val_acc: 0.6039 Epoch 10/100 Epoch 00009: val_loss did not improve 1/1 [==============================] - 71s - loss: 6.3883 - acc: 0.6039 - val_loss: 6.3884 - val_acc: 0.6039 Epoch 11/100 Epoch 00010: val_loss did not improve 1/1 [==============================] - 71s - loss: 6.3884 - acc: 0.6039 - val_loss: 6.3885 - val_acc: 0.6039 Epoch 12/100 Epoch 00011: val_loss did not improve 1/1 [==============================] - 69s - loss: 6.3885 - acc: 0.6039 - val_loss: 6.3886 - val_acc: 0.6039 Epoch 13/100 Epoch 00012: val_loss did not improve 1/1 [==============================] - 73s - loss: 6.3886 - acc: 0.6039 - val_loss: 6.3886 - val_acc: 0.6039 Epoch 14/100 Epoch 00013: val_loss did not improve 1/1 [==============================] - 73s - loss: 6.3886 - acc: 0.6039 - val_loss: 6.3886 - val_acc: 0.6039 Epoch 15/100 Epoch 00014: val_loss did not improve 1/1 [==============================] - 66s - loss: 6.3886 - acc: 0.6039 - val_loss: 6.3887 - val_acc: 0.6039 Epoch 16/100 Epoch 00015: val_loss did not improve 1/1 [==============================] - 65s - loss: 6.3887 - acc: 0.6039 - val_loss: 6.3887 - val_acc: 0.6039 Epoch 17/100 Epoch 00016: val_loss did not improve 1/1 [==============================] - 65s - loss: 6.3887 - acc: 0.6039 - val_loss: 6.3887 - val_acc: 0.6039 Epoch 18/100 Epoch 00017: val_loss did not improve 1/1 [==============================] - 65s - loss: 6.3887 - acc: 0.6039 - val_loss: 6.3887 - val_acc: 0.6039 Epoch 19/100 Epoch 00018: val_loss did not improve 1/1 [==============================] - 65s - loss: 6.3887 - acc: 0.6039 - val_loss: 6.3887 - val_acc: 0.6039 Epoch 20/100 Epoch 00019: val_loss did not improve 1/1 [==============================] - 66s - loss: 6.3887 - acc: 0.6039 - val_loss: 6.3887 - val_acc: 0.6039 Epoch 21/100 Epoch 00020: val_loss did not improve 1/1 [==============================] - 65s - loss: 6.3887 - acc: 0.6039 - val_loss: 6.3887 - val_acc: 0.6039 Epoch 22/100 Epoch 00021: val_loss did not improve 1/1 [==============================] - 65s - loss: 6.3887 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039 Epoch 23/100 Epoch 00022: val_loss did not improve 1/1 [==============================] - 66s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039 Epoch 24/100 Epoch 00023: val_loss did not improve 1/1 [==============================] - 68s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039 Epoch 25/100 Epoch 00024: val_loss did not improve 1/1 [==============================] - 66s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039 Epoch 26/100 Epoch 00025: val_loss did not improve 1/1 [==============================] - 67s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039 Epoch 27/100 Epoch 00026: val_loss did not improve 1/1 [==============================] - 68s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039 Epoch 28/100 Epoch 00027: val_loss did not improve 1/1 [==============================] - 66s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039 Epoch 29/100 Epoch 00028: val_loss did not improve 1/1 [==============================] - 67s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039 Epoch 30/100 Epoch 00029: val_loss did not improve 1/1 [==============================] - 76s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039 Epoch 31/100 Epoch 00030: val_loss did not improve 1/1 [==============================] - 70s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039 Epoch 32/100 Epoch 00031: val_loss did not improve 1/1 [==============================] - 70s - loss: 6.3888 - acc: 0.6039 - val_loss: 6.3888 - val_acc: 0.6039

I have few questions to ask?

  1. Will the model learn just on single Image. (Here I am taking same image for validation also)?
  2. Whatever I am training. When I am using model.predict to predict something, I am always getting a blank image with 500x500 resolution with NO object in it. Am I doing something wrong?
  3. What about weights initialization? You are using 'imagenet', I am not initializing with anything just 'None'. Is that a problem?
  4. Learning Rate. I am using learning rate of 0.001 with ADAM optimizer or RMSPROP?
  5. Is text extraction NOT possible with this model VGG16 ?

... Thanks

JihongJu commented 7 years ago

@nikstar802

  1. If you have only one image, it is very difficult to train a VGG net from scratch.
  2. What is the ratio of textual pixels and non-textual pixels? Imbalanced training samples can be one reason for predicting blank image.
  3. See 1. Pre-trained models won't help either if only one image is provided.
  4. I used ADAM with lr=1e-4 for voc2011 dataset.
  5. It is possible but good choice of hyperparameters is required.
nikstar802 commented 7 years ago

Hi, I realized that, my ratio of textual pixels and non-textual pixels is too low, that might be the issue, because I am resizing large training images to 500 x 500 and during resizing the features might get disrupted.

So, Now I am cropping my training images randomly in 500x500 sized segments, and total I am cropping one image 20 times, so I am getting 20 Sub Images of 500x500 size each from single training Image. This training set I am feeding in. Here is my model summary.


Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) (None, 500, 500, 3) 0


block1_conv1 (Conv2D) (None, 500, 500, 64) 1792 input_1[0][0]


block1_conv2 (Conv2D) (None, 500, 500, 64) 36928 block1_conv1[0][0]


block1_pool (MaxPooling2D) (None, 250, 250, 64) 0 block1_conv2[0][0]


block2_conv1 (Conv2D) (None, 250, 250, 128) 73856 block1_pool[0][0]


block2_conv2 (Conv2D) (None, 250, 250, 128) 147584 block2_conv1[0][0]


block2_pool (MaxPooling2D) (None, 125, 125, 128) 0 block2_conv2[0][0]


block3_conv1 (Conv2D) (None, 125, 125, 256) 295168 block2_pool[0][0]


block3_conv2 (Conv2D) (None, 125, 125, 256) 590080 block3_conv1[0][0]


block3_conv3 (Conv2D) (None, 125, 125, 256) 590080 block3_conv2[0][0]


block3_pool (MaxPooling2D) (None, 63, 63, 256) 0 block3_conv3[0][0]


block4_conv1 (Conv2D) (None, 63, 63, 512) 1180160 block3_pool[0][0]


block4_conv2 (Conv2D) (None, 63, 63, 512) 2359808 block4_conv1[0][0]


block4_conv3 (Conv2D) (None, 63, 63, 512) 2359808 block4_conv2[0][0]


block4_pool (MaxPooling2D) (None, 32, 32, 512) 0 block4_conv3[0][0]


block5_conv1 (Conv2D) (None, 32, 32, 512) 2359808 block4_pool[0][0]


block5_conv2 (Conv2D) (None, 32, 32, 512) 2359808 block5_conv1[0][0]


block5_conv3 (Conv2D) (None, 32, 32, 512) 2359808 block5_conv2[0][0]


block5_pool (MaxPooling2D) (None, 16, 16, 512) 0 block5_conv3[0][0]


block5_fc6 (Conv2D) (None, 16, 16, 4096) 102764544 block5_pool[0][0]


dropout_1 (Dropout) (None, 16, 16, 4096) 0 block5_fc6[0][0]


block5_fc7 (Conv2D) (None, 16, 16, 4096) 16781312 dropout_1[0][0]


dropout_2 (Dropout) (None, 16, 16, 4096) 0 block5_fc7[0][0]


score_feat1 (Conv2D) (None, 16, 16, 1) 4097 dropout_2[0][0]


score_feat2 (Conv2D) (None, 32, 32, 1) 513 block4_pool[0][0]


upscore_feat1 (BilinearUpSamplin (None, 32, 32, 1) 0 score_feat1[0][0]


scale_feat2 (Lambda) (None, 32, 32, 1) 0 score_feat2[0][0]


add_1 (Add) (None, 32, 32, 1) 0 upscore_feat1[0][0]
scale_feat2[0][0]


score_feat3 (Conv2D) (None, 63, 63, 1) 257 block3_pool[0][0]


upscore_feat2 (BilinearUpSamplin (None, 63, 63, 1) 0 add_1[0][0]


scale_feat3 (Lambda) (None, 63, 63, 1) 0 score_feat3[0][0]


add_2 (Add) (None, 63, 63, 1) 0 upscore_feat2[0][0]
scale_feat3[0][0]


upscore_feat3 (BilinearUpSamplin (None, 500, 500, 1) 0 add_2[0][0]


activation_1 (Activation) (None, 500, 500, 1) 0 upscore_feat3[0][0]

Total params: 134,265,411 Trainable params: 134,265,411 Non-trainable params: 0


Now, with this model, even one epoch is NOT moving forward. My system is hung up with this .... Kindly let me know, what best I can do to validate that my model is good at least for single image, so that I can go ahead and arrange a GPU for training purpose.

... Thanks Nikunj

JihongJu commented 7 years ago

@nikstar802 It is generally not recommended to train with a single image. You can feed multiple images or patches for one iteration instead of using a single image for multiple iterations. How to debug a training process is kinda tricky because there are many things that can go wrong, and lack of data is always one of them. This post may be useful to get some hints. In general, you only know if it works or not when you train it with the largest dataset you can have. As a proof of concept, you probably can subsample the images and use a small model, e.g. AlexNet.