Add Sparse embedding support

kalyc commented 6 years ago

Summary

Add minimal test for testing sparse embedding operator support

Related Issues

Missing Sparse operator support

PR Overview

[y] This PR requires new unit tests [y/n] (make sure tests are included)
[n] This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
[y] This PR is backwards compatible [y/n]
[n] This PR changes the current API [y/n]

kalyc commented 6 years ago

As per this issue - we will need to wait for MXNet v1.3 to release to be able to use the new API signature of mx.sym.embedding for sparse. Will update this PR when the new PIP package for MXNet is available.

sandeep-krishnamurthy commented 5 years ago

@kalyc - Can we move ahead with this, as we discussed to use mxnet --preview package?

kalyc commented 5 years ago

Updated PR and tested with end-to-end imdb_lstm model with sparse_grad set to True Removed embedding unit test as there is no data binded to the embedding symbol to test with.

Model -

from __future__ import print_function

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb
from keras import backend as K

max_features = 20000
maxlen = 80  # cut texts after this number of words (among top max_features most common words)
batch_size = 32

print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

print('Build model...')
model = Sequential()

print(K.backend())
# MXNet backend does not support dropout in LSTM and cannot automatically infer shape
if K.backend() == 'mxnet':
    # specifying input_length and removed dropout params
    model.add(Embedding(max_features, 128, input_length=maxlen, sparse_grad=True))
    model.add(LSTM(128, unroll=True))
else:
    model.add(Embedding(max_features, 128))
    model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=1,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

Result -

Using MXNet backend
Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
x_train shape: (25000, 80)
x_test shape: (25000, 80)
Build model...
mxnet
Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/1
/anaconda2/envs/mxnet/lib/python3.4/site-packages/mxnet/module/bucketing_module.py:408: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (1.0 vs. 0.03125). Is this intended?
  force_init=force_init)
[14:57:19] src/operator/nn/../../common/utils.h:450: Optimizer with lazy_update = True detected. Be aware that lazy update with row_sparse gradient is different from standard update, and may lead to different empirical results. See https://mxnet.incubator.apache.org/api/python/optimization/optimization.html for more details.
25000/25000 [==============================] - 242s 10ms/step - loss: 0.4519 - acc: 0.7784 - val_loss: 0.3670 - val_acc: 0.8384
25000/25000 [==============================] - 60s 2ms/step
Test score: 0.36697145671844483
Test accuracy: 0.83836

awslabs / keras-apache-mxnet