Possible Memory Leak - Githubissues

awslabs / keras-apache-mxnet

[DEPRECATED] Amazon Deep Learning's Keras with Apache MXNet support

https://github.com/awslabs/keras-apache-mxnet/wiki

Other

290 stars 65 forks source link

Possible Memory Leak #195

Open Cpruce opened 6 years ago

Cpruce commented 6 years ago

Please make sure that the boxes below are checked before you submit your issue. If your issue is an implementation question, please ask your question on StackOverflow or on the Keras Slack channel instead of opening a GitHub issue.

Thank you!

[ X] Check that you are up-to-date with the master branch of Keras. You can update with: pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps
[X ] Check that your version of TensorFlow is up-to-date. The installation instructions can be found here.
[X ] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

Please see:

https://discuss.mxnet.io/t/possible-memory-leak/1973

roywei commented 6 years ago

@Cpruce Thanks for the issue, I am looking into this, possibly caused by the use of foreach operator.

Cpruce commented 6 years ago

@roywei Thanks for looking into this

roywei commented 6 years ago

@Cpruce I was able to narrow down the memory leak at validation time after each epoch. For now, removing validation during model.fit() resolved this, and use model.evaludate(test_data, test_label) to do validation at the end works fine. We are using bucketing module in keras-mxnet, maybe switching bucket between train and validation caused the memory leak in foreach operator. Need to take another look at that.

Cpruce commented 6 years ago

@roywei awesome thanks I'll try it out soon 👍

roywei commented 6 years ago

For now removing validation dataset resolves the memory leak issue using the following command for training:

history = model1.fit(x_train, y_train,
                    epochs=epochs,
                    batch_size=batch_size,
                    callbacks=[reduce_lr],
                    verbose=2)

need to investigate on how to re-enbale validation stage

julioasotodv commented 6 years ago

I can confirm that the memory leak is happening in mxnet-mkl 1.13.1 under Linux, when running the imdb_bidirectional_lstm.py in the examples folder (which includes a validation set)

MandarGogate commented 6 years ago

There is no memory leak when mxnet-cu90mkl==1.2.1 is used. However, mxnet-cu90mkl==1.3.1 throws error when validation data is used.