GPU Memory Issue in Recursor

jxchen01 commented 7 years ago

Hello,

I encountered a strange memory issue when using Recursor to decorate non-AbstractRecurrent modules. My test code can be found here.

I want to process a sequence of images using a recurrent network. To test the memory issue, I only use a few spatial convolution layers in the model. Because there are non-AbstractRecurrent, I decorate the module with Recursor( ). (Note: This model is only for test, so contains only simple non-AbstractRecurrent layers to simplify the problem.)

In each iteration of training, I feed the model a sequence of images. I use multiple forward pass and then multiple backward pass in the reverse order. I observe the following GPU memory usage. It seems like most of the memory (i.e., 10GB out of 12GB) is consumed in the first iteration. It is really hard for me to understand what is this big portion of memory used for.

Total Memory: 12884705280

Iteration 1:

available 12614729728 -- before forward available 3704307712 -- after the 1st backward available 3528785920 -- after the 2nd backward available 3353272320 -- after the 3rd backward available 3177758720 -- after the 4th backward available 3002236928 -- after the 5th backward

Iteration 2:

available 2985984000 -- before forward available 1063698432 -- after the 1st backward available 1062649856 -- after the 2nd backward available 1061601280 -- after the 3rd backward available 1060552704 -- after the 4th backward available 1059504128 -- after the 5th backward

Iteration 3:

available 1066844160 -- before forward available 1063698432 -- after the 1st backward available 1062649856 -- after the 2nd backward available 1061601280 -- after the 3rd backward available 1060552704 -- after the 4th backward available 1059504128 -- after the 5th backward

Thanks! Jianxu

nicholas-leonard commented 7 years ago

@jxchen01 I haven't been able to completely reproduce your code as I am missing cudnn and access to a 12GB GPU :) . However, looking at your model, it seems that it would make sense that it is consuming so much memory. The cnn has 1024 channels at some point, and the input image size is 180 x 180. With a sequence length of 20, you need 20 copies of the intermediate output/gradInput buffers the entire cnn. These buffers are only initialized when calling forward/backward. I think this might explain a lot of your memory consumption.

A workaround is to break your sequences into smaller sub-sequences, wrapping your Recursor into a Sequencer, and calling remember() on it. Only call forget at the end of the entire sequence.

jxchen01 commented 7 years ago

@nicholas-leonard Thanks for your suggestion.

Another version of this test without cudnn can be found here. It is easy to change the value of imageSize (line 29) to reduce the image size in order to fit into available memory.

(1) I am only processing a small sub-sequence, say length 3, out of the whole sequence (30 images). Due to the memory issue, I cannot increase the sub-sequence length to 6 or 8 in the full model (more complicated than the test one). That's why I want to resolve this issue.

(2) Decorating with Sequencer() is a little more efficient than Recursor(). For example, the available memory at each stage is

Using Recursor(): Iteration 1: 12GB-->3GB Iteration 2: 3GB -->1GB Iteration 3: 1GB --> 1GB

Using Sequencer() Iteration 1: 12GB --> 3.3GB Iteration 2: 3.3GB --> 3.3GB Iteration 3: 3.3GB --> 3.3GB

Question:

I tried forget() and recycle() after finishing each whole sequence. But, very tiny memory can be freed (less than 0.2GB). Is there any way I can free the buffers?

Thanks! Jianxu

Element-Research / rnn

GPU Memory Issue in Recursor #338