load_weights () is not loading weights if pre-trained model is used

rajendra2 commented 6 years ago

I am using keras-mxnet 2.1.6.1.

I trained a model on cifar10 data using densenet121 (code below). There is no issue in training. However if I load weights to continue the training or call predict, it seems weight doesn't get loaded since training again starts from same loss/accuracy as the first time. Predict results are all nan.

from __future__ import print_function
import keras
from keras.applications.densenet import DenseNet121
from keras.layers.pooling import GlobalAveragePooling2D
from keras.layers.core import Dense
from keras.layers import Input
from keras.models import Model
from keras.regularizers import *

def get_model():
        aliases = {}
        Input_1 = Input(shape=(3, 221, 221), name='Input_1')
        DenseNet121_1_model = DenseNet121(include_top= False, input_tensor = Input_1)
        DenseNet121_1 = DenseNet121_1_model(Input_1)
        aliases['DenseNet121_1'] = DenseNet121_1_model.name
        num_layers = len(DenseNet121_1_model.layers)
        for i, layer in enumerate(DenseNet121_1_model.layers):
                if ((i * 100) / (num_layers - 1)) <= (100 - 10):
                        layer.trainable = False
        GlobalAveragePooling2D_1 = GlobalAveragePooling2D(name='GlobalAveragePooling2D_1')(DenseNet121_1)
        Dense_1 = Dense(name='Dense_1',units= 10,activation= 'softmax' )(GlobalAveragePooling2D_1)

        model = Model([Input_1],[Dense_1])
        return model

from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
import os
from skimage.transform import resize
import numpy as np

batch_size = 16
num_classes = 10
epochs = 2

# The data, split between train and test sets:
(x_train1, y_train1), (x_test1, y_test1) = cifar10.load_data()

y_train = y_train1[:x_train1.shape[0]//5]
y_test = y_test1[:x_test1.shape[0]//5]  

x_train = np.ndarray((x_train1.shape[0]//5, 3,221,221), dtype=np.float32)
x_test = np.ndarray((x_test1.shape[0]//5, 3,221,221), dtype=np.float32)

for i in range(x_train.shape[0]):
    x_train[i] = resize(x_train1[i], (3,221,221), anti_aliasing=True)

for i in range(x_test.shape[0]):
    x_test[i] = resize(x_test1[i], (3,221,221), anti_aliasing=True)

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# initiate RMSprop optimizer
opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)

model=get_model()
#model = keras.models.load_model("cifar.h5")
# Let's train the model using RMSprop
model.compile(loss='categorical_crossentropy',
              optimizer=opt, context=["gpu(0)"],
              metrics=['accuracy'])

x_train /= 255
x_test /= 255

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          validation_data=(x_test, y_test),
          shuffle=True)
model.save("cifar.h5")

sandeep-krishnamurthy commented 6 years ago

Hi @rajendra2, Call DenseNet121_1_model = DenseNet121(include_top= False, input_tensor = Input_1) loads 'image_net' weights in tf dim ordering and kernels. (densenet121_weights_tf_dim_ordering_tf_kernels.h5)

The current release of Keras-MXNet does not support loading models trained with other backends. This is a feature that we will be working soon.

rajendra2 commented 6 years ago

What I am observing is that when I train without weights, the training accuracy is much lower but when I train with image_net weights, accuracy is pretty good. That means TF weights are getting loaded and working fine in training.

Given Keras-MXNet does not support loading models trained with other backends, is above expected ?

Secondly, since model is being trained with MxNet backend, shouldn't model.save_weights() save it in right format which can be loaded later on with MxNet backend.

Maybe the issue is in save_weights() when 'imagenet' weights are in tf dim_order and kernel. Is that the issue you also meant?

MohammadSamragh commented 6 years ago

I am having the same issue. I tried to see where the issue is. It seems like the weights are not changing during training at all:

weights_before_training=model.layers[0].get_weights()[0]
model.fit(X_train,Y_train,...)
weights_after_training=model.layers[0].get_weights()[0]
diff=numpy.sum(numpy.absolute(weights_after_training-weights_before_training))
print diff

In the above code "diff" is zero. during the training the accuracy is going up but once the training is done, I cannot access the trained weights. It seems like it loads the initial weights (before training) again.

rajendra2 commented 6 years ago

This was a problem in keras-mxnet 1.2.2.1 as well and I was hoping that it will be resolved in keras2-mxnet.

I just ran a test with 1.2.2.1 version with squeezenet and imagenet weights. Printed the weight difference of first layer's weight. It was 0. I then replaced sequeezenet with conv/maxpool layers and different is non-zero.

awslabs / keras-apache-mxnet

load_weights () is not loading weights if pre-trained model is used #114