Composing Models does not work with the MXNet backend

ssbusc1 commented 5 years ago

I would like to stack together different models similar to what is described here: https://stackoverflow.com/questions/50092589/how-to-vertically-stack-trained-models-in-keras

This does not seem to work with the MXNet backend. Specifically, the simpler version of wrapping one model within another also does not work. I've included some sample code below that works with the Theano backend, but does not work with the MXNet backend.

[X] If running on MXNet, check that you are up-to-date with the latest version. The installation instructions can be found here

I'm on keras-mxnet 2.2.4.1 installed via pip.

[X] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

Example code below.

import numpy as np
np.random.seed(12345)

import keras
from keras.models import Model
from keras.layers import Input, Dense, Dropout

# Set up some dummy data. Basically, this just represents an identity function for odd/even numbers.

x = np.zeros((1000, 2))
y = np.zeros((1000, 2))

for i in range(x.shape[0]):
    if i % 2 == 0:
        x[i, 0] = 1
        y[i, 0] = 1
    else:
        x[i, 1] = 1
        y[i, 1] = 1

# Create a simple model
num_hidden = 10
dropout = 0.5
sigma = 0.01
weight_initializer = keras.initializers.RandomNormal(mean=0.0, stddev=sigma)
bias_initializer = keras.initializers.Zeros()
batch_size = 20
num_epoch = 200

inp = Input(shape=(2,))
hidden1 = Dense(num_hidden, kernel_initializer=weight_initializer, bias_initializer=bias_initializer, activation='relu')(inp)
dropout1 = Dropout(dropout)(hidden1)
output = Dense(2, kernel_initializer=weight_initializer, bias_initializer=bias_initializer, activation='softmax')(dropout1)

model = Model(inputs=[inp], outputs=[output])
learning_rate = 0.01
optimizer = keras.optimizers.SGD(lr=learning_rate)
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

model.fit(x=train_x, y=train_y, epochs=num_epoch, batch_size=batch_size, verbose=2)

# This model should clearly overfit to the data. Evaluation on a slice of the input:
model.predict(x[0:10])

array([[  9.99917269e-01,   8.27653857e-05],
       [  2.75039609e-04,   9.99724925e-01],
       [  9.99917269e-01,   8.27653857e-05],
       [  2.75039609e-04,   9.99724925e-01],
       [  9.99917269e-01,   8.27653857e-05],
       [  2.75039609e-04,   9.99724925e-01],
       [  9.99917269e-01,   8.27653857e-05],
       [  2.75039609e-04,   9.99724925e-01],
       [  9.99917269e-01,   8.27653857e-05],
       [  2.75039609e-04,   9.99724925e-01]], dtype=float32)

# Wrap the model into another model, and predict again.
wrapping_model = Model(inputs=model.inputs, outputs=model.outputs)
wrapping_model.predict(x[0:10])

array([[ 0.50003469,  0.49996528],
       [ 0.49993384,  0.50006616],
       [ 0.50003469,  0.49996528],
       [ 0.49993384,  0.50006616],
       [ 0.50003469,  0.49996528],
       [ 0.49993384,  0.50006616],
       [ 0.50003469,  0.49996528],
       [ 0.49993384,  0.50006616],
       [ 0.50003469,  0.49996528],
       [ 0.49993384,  0.50006616]], dtype=float32)

With MXNet, the predictions with the original model seem fine, but with the wrapped model, the predictions are just random. With Theano, the results are identical to the predictions generated by the original model.

roywei commented 5 years ago

Hi @ssbusc1 , thanks for submitting this issue. In mxnet backend, we have to override Keras Model and use MXNet Moduel under the hood. So the above code does not tranfer the weights from model to wrapping_model. You have to copy the weights over.

You can do that by either: 1) save or load the weights if the two models have the same structure

model.save_weights('weights.h5')
wrapping_model = Model(inputs=model.inputs, outputs=model.outputs)
wrapping_model.load_weights('weights.h5')

2) use layer.get_weights() and layer.set_weights() for specific layers to want the weights to be copied.

wrapping_model = Model(inputs=model.inputs, outputs=model.outputs)
for layer, wrapped_layer in zip(model.layers, wrapping_model.layers):
    print(layer.name)
    print(wrapped_layer.name)
    weights = layer.get_weights()
    wrapped_layer.set_weights(weights)
print(wrapping_model.predict(x[0:10]))
wrapping_model.summary()

This will produce the same prediction as original model

[[9.9909782e-01 9.0223132e-04]
 [1.6969813e-03 9.9830294e-01]
 [9.9909782e-01 9.0223132e-04]
 [1.6969813e-03 9.9830294e-01]
 [9.9909782e-01 9.0223132e-04]
 [1.6969813e-03 9.9830294e-01]
 [9.9909782e-01 9.0223132e-04]
 [1.6969813e-03 9.9830294e-01]
 [9.9909782e-01 9.0223132e-04]
 [1.6969813e-03 9.9830294e-01]]

ssbusc1 commented 5 years ago

Thanks. The wrapping_model will eventually have a different structure. I'll try approach #2 above. First class support for this will definitely help (as the other backends already support this) as the composition gets more involved.

awslabs / keras-apache-mxnet

Composing Models does not work with the MXNet backend #223