[Progress Report] Implementation of final decision layer

This is to document the implement process of constructing final decision layer using both Spectrogram+CNN and audio+LSTM models. Current two possible flow can be:

CNN and LSTM model train with same input data(in different form) at the same time, tune the dense layer of both output
fully train two model, froze it, then do training based on dense layer of both output

Current Implementation will be option 1

From https://github.com/keras-team/keras/issues/7581 The post mentioned that only weights of the model can be transferred between Tensorflow and Keras, therefore it's still required to build the CNN or LSTM model in Keras or Tensorflow. For now I'll continue with Keras. The New CNN model in Keras will try to have same structure as before. which is Inception V3 +2 Dense layer

PS: Tensorflow save model as .pb file, and can restore variable and weights Keras save model as json or h5py. Json only save model structure, h5py save weights.

Keras provide Inception V3 weight and model in their library, along with other network models such as VGG and ResNet.

Currently there's issue found in Keras inception V3 model. http://blog.datumbox.com/the-batch-normalization-layer-of-keras-is-broken/ https://github.com/keras-team/keras/issues/9214

Current structure is the following qq 20190214022743 The structure is identical to Katherine's model in Tensorflow. However due to implementation method. The Keras inception v3 takes in image with # of channel = 3 Here, we do modification to the input

img_input = Input(shape=(image_size, image_size, 1))
    img_conc = Concatenate()([img_input, img_input, img_input])
    base_model = InceptionV3(weights='imagenet', include_top=True,input_tensor = img_conc)

By this we cast a grayscale image to rgb channel and feed it into the image.(I also tried with reading the grayscale image as rgb, it doesn't change much in terms of result)

However, I face a strange bug where the network itself doesn't seem to learn from the inception v3 network.

    img_input = Input(shape=(image_size, image_size, 1))
    img_conc = Concatenate()([img_input, img_input, img_input])
    base_model = InceptionV3(weights='imagenet', include_top=True,input_tensor = img_conc)
    #for layer in base_model.layers:
    #    layer.trainable = False
    firstLayer = base_model.output
    secondLayer = Dropout(0.5)(firstLayer)
    outputLayer = Dense(6,activation='softmax')(secondLayer)
    model = Model(input=base_model.input,output = outputLayer )

    model.compile(loss='categorical_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])
    plot_model(model, to_file=r'C:\Users\zhanglichuan\Desktop\ECE496\lstm\model.png', show_shapes=True)
    history = model.fit(X_train_image, Y_train, epochs=30, validation_split=0.25)

qq 20190214024211 qq 20190214024024

From graph we see that although training loss is decreasing, validation loss is not. Also we get val accuracy around 20% which is very close to random guessing.

I've tried with various method like changing the size of input, change number of layers. However results are similar or even worse.

Here's result after adding another dense layer of size 200: qq 20190214030441 qq 20190214030524 The result is worse compare to previous model.

Inception V3 is not the only model that can classify image. I also tried with VGG16 model. Similar result achieved.

Possible error:

Resizing of input spectrogram
Data Augmentation
Well, bugs in keras application

For the potential bug exist in Batch Normalization Layer of Inception v3 model, I apply a custom patch from https://github.com/datumbox/keras/tree/fork/keras2.2.4 Detail explanation can be find here: http://blog.datumbox.com/the-batch-normalization-layer-of-keras-is-broken/

Here's result qq 20190214033851 qq 20190214033858

231/231 [==============================] - 0s 2ms/step test accuracy is 0.26839826852728277

I add another Dense layer of size 1024 and dropout of 0.5.The result shows over-fitting.

qq 20190214051034 qq 20190214051049

231/231 [==============================] - 0s 2ms/step test accuracy is 0.2727272729207943

After some debugging, current graph looks better now, here is the graph. QQ截图20190312074335 QQ截图20190312074348

231/231 [==============================] - 13s 55ms/step test accuracy is 0.4363203467073895

After further modification of the program, the graph become QQ截图20190312083559 QQ截图20190312083607

On test set 231/231 [==============================] - 1s 4ms/step test accuracy is 0.48051948103553804

Here's the current CNN model structure. QQ截图20190313070332

I'll proceed with final decision layer.

Summary of the Overall Implemtation

The Implementation of the network is complete, Here's the graph of the overall structure QQ截图20190314062032

Observation during implementation The combined Model have relatively high accuracy compare to individual model such as CNN or LSTM. When LSTM has test accuacy around 50 and CNN has test accuracy around 45, the combined model usually achieves about 55-60 percent accuracy.

"Key Observation"

Combined model does do better as expected
lack of data
Parameters not tuned well

Problems during the implementation The result from the combine model is unstable, especially when I split my training set into training and validation set. This cause a decrease in accuracy. Random Initializaiton of the model affects learning and accuracy of the model.(getting different test accuracy each time I train the model)

Here's the loss and accuracy graph QQ截图20190314061917

QQ截图20190313071231

Apparently there‘s something wrong with the valiation set. I don't have an explanation for it currently.

On test set 231/231 [==============================] - 0s 2ms/step test accuracy is 0.5541125542415685

Here's a picture of the test accuracy reaching 60%:

Project Constraint meet! However I don't suggest since we don't have enough data. The result accuracy fluctuate between 55 to 60%.

capstone496 / SpeechSentiments

[Progress Report] Implementation of final decision layer #14