golsun / SpaceFusion

NAACL'19: "Jointly Optimizing Diversity and Relevance in Neural Response Generation"
https://arxiv.org/abs/1902.11205
74 stars 14 forks source link

Memory Alloc #3

Open Octopirate1 opened 4 years ago

Octopirate1 commented 4 years ago

I used the reddit scripts to generate a vali.num, train.num, and test.num from 2011-05. However, when running with this data, I get a tmcalloc warning for 32731955200 bytes (32 GB). My RAM on the machine I am running this with (Google Colab) has only about 12 GB.

The toy dataset works fine.

Full log below:

@@@@@@@@@@@@@@@@@@@@
hostname:  28137a6dc590
data_path: data
out_path:  out
@@@@@@@@@@@@@@@@@@@@
Using TensorFlow backend.
loss: --------------------
  10.00 <function _sqrt_mse at 0x7f186799a730>
  -10.00 <function _batch_spread at 0x7f1867a0bbf8>
  -10.00 <function _batch_spread at 0x7f1867a0bbf8>
  0.33 categorical_crossentropy
  0.33 categorical_crossentropy
  0.33 categorical_crossentropy
--------------------
out/reddit_width(128, 128, 0.0)_depth(2, 2)/mtask_interp_std0.10_ST10.00_SS10.00_TT10.00
already exists, do you want to delete the folder? (y/n)
y
fld deleted
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:541: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:66: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4432: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3239: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4409: The name tf.random_normal is deprecated. Please use tf.random.normal instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:793: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING:tensorflow:From /content/SpaceFusion/src/model.py:480: The name tf.squared_difference is deprecated. Please use tf.math.squared_difference instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3576: The name tf.log is deprecated. Please use tf.math.log instead.

out/reddit_width(128, 128, 0.0)_depth(2, 2)/mtask_interp_std0.10_ST10.00_SS10.00_TT10.00

***** Epoch 1/20, trained 0.00M *****
loading data, check_src = False...
tcmalloc: large alloc 32731955200 bytes == 0x7c5c000 @  0x7f18bb2bf001 0x7f18b6970765 0x7f18b69d4dc0 0x7f18b69d6c5f 0x7f18b6a6d238 0x50ac25 0x50c5b9 0x508245 0x50a080 0x50aa7d 0x50c5b9 0x509d48 0x50aa7d 0x50c5b9 0x508245 0x50a080 0x50aa7d 0x50d390 0x509d48 0x50aa7d 0x50c5b9 0x508245 0x50b403 0x635222 0x6352d7 0x638a8f 0x639631 0x4b0f40 0x7f18baebab97 0x5b2fda
^C

How can I reduce this memory alloc?

Thank you.

golsun commented 4 years ago

Sorry for late reply. Could you please try with smaller batch size? e.g. python src/main.py mtask train --data_name=toy --batch_size=32 (you may need to git pull first)