Open Cpruce opened 6 years ago
@Cpruce Thanks for the issue, I am looking into this, possibly caused by the use of foreach operator.
@roywei Thanks for looking into this
@Cpruce I was able to narrow down the memory leak at validation time after each epoch. For now, removing validation during model.fit() resolved this, and use model.evaludate(test_data, test_label) to do validation at the end works fine. We are using bucketing module in keras-mxnet, maybe switching bucket between train and validation caused the memory leak in foreach operator. Need to take another look at that.
@roywei awesome thanks I'll try it out soon 👍
For now removing validation dataset resolves the memory leak issue using the following command for training:
history = model1.fit(x_train, y_train,
epochs=epochs,
batch_size=batch_size,
callbacks=[reduce_lr],
verbose=2)
need to investigate on how to re-enbale validation stage
I can confirm that the memory leak is happening in mxnet-mkl 1.13.1 under Linux, when running the imdb_bidirectional_lstm.py in the examples folder (which includes a validation set)
There is no memory leak when mxnet-cu90mkl==1.2.1
is used. However, mxnet-cu90mkl==1.3.1
throws error when validation data is used.
Please make sure that the boxes below are checked before you submit your issue. If your issue is an implementation question, please ask your question on StackOverflow or on the Keras Slack channel instead of opening a GitHub issue.
Thank you!
[ X] Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps
[X ] Check that your version of TensorFlow is up-to-date. The installation instructions can be found here.
[X ] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).
Please see:
https://discuss.mxnet.io/t/possible-memory-leak/1973