awslabs / keras-apache-mxnet

[DEPRECATED] Amazon Deep Learning's Keras with Apache MXNet support
https://github.com/awslabs/keras-apache-mxnet/wiki
Other
289 stars 65 forks source link

Masking Layer doesn't work after adding a NaiveRunGraph feature in MXNet #228

Open karan6181 opened 5 years ago

karan6181 commented 5 years ago

Thank you!

Below is the minimum reproducible code:

import numpy as np
from keras.layers import LSTM
from keras.layers import Embedding
from keras.models import Sequential

num_samples = 2
timesteps = 5
embedding_dim = 4
units = 3
embedding_num = 12

model = Sequential()
model.add(Embedding(embedding_num, embedding_dim,
                               mask_zero=True,
                               input_length=timesteps
                               ))

# layer = recurrent.SimpleRNN(units)
layer = LSTM(units)
model.add(layer)
model.compile(optimizer='sgd', loss='mse')

left_padded_input = np.ones((num_samples, timesteps))
left_padded_input[0, :1] = 0
left_padded_input[1, :2] = 0
out6 = model.predict(left_padded_input)
roywei commented 5 years ago

I think it triggers navie run graph only if masking enabled + sym.foreach operator used. Which means RNN layer with unroll=False does not work with masking layer.
Current workaround to enable masking: use unroll=True in RNN layer

karan6181 commented 5 years ago

Yes absolutely correct. If we add unroll=True in RNN/LSTM/GRU layer, it uses the static forward and works without any issue.

Below is the running code where I have added unroll=True:

import numpy as np
from keras.layers import LSTM
from keras.layers import Embedding
from keras.models import Sequential

num_samples = 2
timesteps = 5
embedding_dim = 4
units = 3
embedding_num = 12

model = Sequential()
model.add(Embedding(embedding_num, embedding_dim,
                               mask_zero=True,
                               input_length=timesteps
                               ))

# layer = recurrent.SimpleRNN(units)
layer = LSTM(units, unroll=True)
model.add(layer)
model.compile(optimizer='sgd', loss='mse')

left_padded_input = np.ones((num_samples, timesteps))
left_padded_input[0, :1] = 0
left_padded_input[1, :2] = 0
out6 = model.predict(left_padded_input)