Validation accuracy doesn't increase while retraining the model

YuehWu1994 commented 7 years ago

Hello sir, It is a great reference as my own research. However, I got a problem while I retrained the model. After I retrained your original model without any new data , there wasn't any increase in validation accuracy (log below). Do you know what might be the reasons of this issue?Thank you.

( By the way, I didn't modify your code except for printing numpy array)

rex@Zeus:~/Desktop/python_hr/retrained_5CNN$ python trackgesture.py Using TensorFlow backend.

What would you like to do ? 1- Use pretrained model for gesture recognition & layer visualization 2- Train the model (you will require image samples for training under .\imgfolder) 3- Visualize feature maps of different layers of trained model 2

conv2d_1 (Conv2D) (None, 32, 198, 198) 320
activation_1 (Activation) (None, 32, 198, 198) 0
conv2d_2 (Conv2D) (None, 32, 196, 196) 9248
activation_2 (Activation) (None, 32, 196, 196) 0
max_pooling2d_1 (MaxPooling2 (None, 32, 98, 98) 0
dropout_1 (Dropout) (None, 32, 98, 98) 0
flatten_1 (Flatten) (None, 307328) 0
dense_1 (Dense) (None, 128) 39338112
activation_3 (Activation) (None, 128) 0
dropout_2 (Dropout) (None, 128) 0
dense_2 (Dense) (None, 5) 645
activation_4 (Activation) (None, 5) 0
Total params: 39,348,325 Trainable params: 39,348,325 Non-trainable params: 0

(4015, 40000) Press any key samples_per_class - 803 total_image - 4015 Train on 3212 samples, validate on 803 samples Epoch 1/15 2017-08-18 11:50:13.901818: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-08-18 11:50:13.901850: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-08-18 11:50:13.901855: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-08-18 11:50:13.901859: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-08-18 11:50:13.901862: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2017-08-18 11:50:14.108379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate (GHz) 1.835 pciBusID 0000:02:00.0 Total memory: 7.92GiB Free memory: 7.30GiB 2017-08-18 11:50:14.108407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 2017-08-18 11:50:14.108413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y 2017-08-18 11:50:14.108419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:02:00.0) 3212/3212 [==============================] - 33s - loss: 1.7759 - acc: 0.2008 - val_loss: 1.6096 - val_acc: 0.2067 Epoch 2/15 3212/3212 [==============================] - 31s - loss: 1.6020 - acc: 0.2481 - val_loss: 1.6106 - val_acc: 0.2080 Epoch 3/15 3212/3212 [==============================] - 31s - loss: 1.5640 - acc: 0.2905 - val_loss: 1.6233 - val_acc: 0.2105 Epoch 4/15 3212/3212 [==============================] - 31s - loss: 1.4404 - acc: 0.4019 - val_loss: 1.6827 - val_acc: 0.1768 Epoch 5/15 3212/3212 [==============================] - 31s - loss: 1.2607 - acc: 0.4978 - val_loss: 1.7530 - val_acc: 0.1843 Epoch 6/15 3212/3212 [==============================] - 31s - loss: 1.0425 - acc: 0.6077 - val_loss: 1.9313 - val_acc: 0.1768 Epoch 7/15 3212/3212 [==============================] - 31s - loss: 0.8596 - acc: 0.6868 - val_loss: 2.0786 - val_acc: 0.1781 Epoch 8/15 3212/3212 [==============================] - 31s - loss: 0.7027 - acc: 0.7335 - val_loss: 2.2749 - val_acc: 0.1781 Epoch 9/15 3212/3212 [==============================] - 31s - loss: 0.6062 - acc: 0.7746 - val_loss: 2.6289 - val_acc: 0.1893 Epoch 10/15 3212/3212 [==============================] - 32s - loss: 0.5311 - acc: 0.7948 - val_loss: 2.7380 - val_acc: 0.1868 Epoch 11/15 3212/3212 [==============================] - 31s - loss: 0.4637 - acc: 0.8207 - val_loss: 2.8698 - val_acc: 0.1880 Epoch 12/15 3212/3212 [==============================] - 32s - loss: 0.4362 - acc: 0.8216 - val_loss: 3.0280 - val_acc: 0.1880 Epoch 13/15 3212/3212 [==============================] - 32s - loss: 0.3905 - acc: 0.8450 - val_loss: 3.2075 - val_acc: 0.1893 Epoch 14/15 3212/3212 [==============================] - 31s - loss: 0.3513 - acc: 0.8621 - val_loss: 3.3562 - val_acc: 0.1930 Epoch 15/15 3212/3212 [==============================] - 31s - loss: 0.3271 - acc: 0.8702 - val_loss: 3.4325 - val_acc: 0.1893

asingh33 commented 7 years ago

Hey Rex,

I noticed that you are using TensorFlow as backend, is it intentional ? I had used Theano infact. I may be wrong but I think in order to use my script under Tensorflow may require few additional changes. And I think that might be causing this accuracy difference that you are observing. If backend library is not your concern then would you mind try with Theano once?

YuehWu1994 commented 7 years ago

Hello @asingh33 , after using Theano as backend(KERAS_BACKEND=theano python trackgesture.py) , I still encountered the problem and want to request you several questions.

Do you train with or without GPU?
What is your GPU specification?
Do you modify parameters in .theanorc?

(My operating environment : Keras 2.0.6, Theano:0.9.0, Python: 2.7.12)

rex@Zeus:~/Desktop/python_hr/retrained_5CNN$ KERAS_BACKEND=theano python trackgesture.py Using Theano backend.

What would you like to do ? 1- Use pretrained model for gesture recognition & layer visualization 2- Train the model (you will require image samples for training under .\imgfolder) 3- Visualize feature maps of different layers of trained model 2

Layer (type) Output Shape Param #
conv2d_1 (Conv2D) (None, 32, 198, 198) 320

activation_1 (Activation) (None, 32, 198, 198) 0

conv2d_2 (Conv2D) (None, 32, 196, 196) 9248

activation_2 (Activation) (None, 32, 196, 196) 0

max_pooling2d_1 (MaxPooling2 (None, 32, 98, 98) 0

dropout_1 (Dropout) (None, 32, 98, 98) 0

flatten_1 (Flatten) (None, 307328) 0

dense_1 (Dense) (None, 128) 39338112

activation_3 (Activation) (None, 128) 0

dropout_2 (Dropout) (None, 128) 0

dense_2 (Dense) (None, 5) 645

activation_4 (Activation) (None, 5) 0

Total params: 39,348,325 Trainable params: 39,348,325 Non-trainable params: 0

(4015, 40000) Press any key samples_per_class - 803 total_image - 4015 Train on 3212 samples, validate on 803 samples Epoch 1/15 3212/3212 [==============================] - 767s - loss: 1.8494 - acc: 0.1930 - val_loss: 1.6107 - val_acc: 0.2055 Epoch 2/15 3212/3212 [==============================] - 705s - loss: 1.6078 - acc: 0.2170 - val_loss: 1.6110 - val_acc: 0.2067 Epoch 3/15 3212/3212 [==============================] - 707s - loss: 1.5753 - acc: 0.2765 - val_loss: 1.6208 - val_acc: 0.2092 Epoch 4/15 3212/3212 [==============================] - 703s - loss: 1.4798 - acc: 0.3624 - val_loss: 1.6822 - val_acc: 0.2130 Epoch 5/15 3212/3212 [==============================] - 699s - loss: 1.3243 - acc: 0.4598 - val_loss: 1.7286 - val_acc: 0.1930 Epoch 6/15 3212/3212 [==============================] - 700s - loss: 1.1305 - acc: 0.5657 - val_loss: 1.8506 - val_acc: 0.1918 Epoch 7/15 3212/3212 [==============================] - 702s - loss: 0.9290 - acc: 0.6516 - val_loss: 2.0530 - val_acc: 0.1930 Epoch 8/15 3212/3212 [==============================] - 708s - loss: 0.8105 - acc: 0.6996 - val_loss: 2.1557 - val_acc: 0.2042 Epoch 9/15 3212/3212 [==============================] - 707s - loss: 0.6796 - acc: 0.7469 - val_loss: 2.3770 - val_acc: 0.1831 Epoch 10/15 3212/3212 [==============================] - 707s - loss: 0.6049 - acc: 0.7699 - val_loss: 2.4995 - val_acc: 0.1868 Epoch 11/15 3212/3212 [==============================] - 707s - loss: 0.5156 - acc: 0.8045 - val_loss: 2.7483 - val_acc: 0.1893 Epoch 12/15 3212/3212 [==============================] - 708s - loss: 0.4697 - acc: 0.8191 - val_loss: 2.7471 - val_acc: 0.1806 Epoch 13/15 3212/3212 [==============================] - 710s - loss: 0.4334 - acc: 0.8294 - val_loss: 2.9891 - val_acc: 0.1868 Epoch 14/15 3212/3212 [==============================] - 719s - loss: 0.3965 - acc: 0.8487 - val_loss: 3.0107 - val_acc: 0.1930 Epoch 15/15 3212/3212 [==============================] - 726s - loss: 0.3796 - acc: 0.8490 - val_loss: 3.1486 - val_acc: 0.1818

asingh33 commented 7 years ago

Hey Rex,

Do you train with or without GPU? asingh33: At the time of writing this project I used only CPU. Didnt have appropriate GPU support. What is your GPU specification? asingh33: Not Applicable. Do you modify parameters in .theanorc? asingh33: None that I can recall now, but let me confirm once I am back home.

asingh33 commented 7 years ago

Hey Rex, I checked my .theanorc file and I confirm that I dont have any changes in there. BTW I kind of lost your original question here. What is your actual concern? Is it why that accuracy not going above that 84.9 % or something ??

YuehWu1994 commented 7 years ago

Hello @asingh33, the issue is that I can't get the same result as you in Ubuntu 16.04 system. However, I found that I can get a similar result (above 85% accuracy) in Windows and OSX system. The reason is still unknown....

sanjanasri commented 7 years ago

hi,

I am facing the same issue too... Validation accuracy is not getting increased when i retrain in ubuntu 16.04... and i also I get another error "HIGHGUI ERROR: V4L/V4L2: VIDIOC_S_CROP".. May I know where I am wrong..

It would be great if I canget an earnest reply.. Thank You

asingh33 commented 7 years ago

Guys, I have only tested this project on OSX and Windows. Never get the chance to check on any Linux version yet. But your observations are really interesting as why the accuracy is getting affected on Ubuntu. I am afraid at the moment I dont have any answer to it. Have you guys checked if the captured images are showing up right? I may have to do some research of my own to figure that out but due to time constraint I am not able to spend any time at the moment. I would really appreciate if any of you are able to figure out the root cause and let me know.

@sanjanasri regarding your that error msg, I did few searches on google. I may be wrong but it seems like its localized to Linux distros. Like I said I have not got the chance to try on Linux so wont be able to help here buddy.

xzyaoi commented 6 years ago

@asingh33 @YuehWu1994 @sanjanasri I am sure that this problem is because the files under Ubuntu System are NOT sorted by filename. Therefore, when you are calling

    immatrix = np.array([np.array(Image.open(path2+ '/' + images).convert('L')).flatten()
                         for images in imlist], dtype = 'f')

the immatrix is not organized as you thought but.. some kind of a mess.

For a temporary fix, try using this:

    immatrix = np.array([np.array(Image.open(path2+ '/' + images).convert('L')).flatten()
                         for images in sorted(imlist)], dtype = 'f')

If possible, using annotation files could solve that problem.

If it is not the problem, please let me know.

asingh33 commented 6 years ago

@xzyaoi Thanks for taking time in fixing it. I will review it once on my end and pull the changes.

asingh33 / CNNGestureRecognizer

Validation accuracy doesn't increase while retraining the model #8