Out of VRAM - Githubissues

fakinated commented 6 years ago

While running train.py I'm running out of GPU memory. I already tried to set the batch size down to 4 without any improvement. Can you recommend any model parameters to adapt? My GPU is a GTX 970 4GB (of which Tensorflow can only use 3.5GB).

The error is: OP_REQUIRES failed at transpose_op.cc:199 : Resource exhausted: OOM when allocating tensor with shape[4,16,16,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

Allocation Stats:

Stats: 
Limit:                  3432906752
InUse:                  3428587264
MaxInUse:               3431733248
NumAllocs:                    1546
MaxAllocSize:            597213184

EDIT: I tried to set ENCODER_DIM to 256 and I don't get Memory errors anymore, but now I'm presented with this error:

Traceback (most recent call last):
  File "train.py", line 151, in <module>
    figure = figure.reshape( (4,4) + figure.shape[1:] )
ValueError: cannot reshape array of size 589824 into shape (4,4,3,128,128,3)

Picslook commented 6 years ago

I have a 980ti and I can only do a batch size of 8, any higher and I get memory errors, any lower and I get the same error that you have. I don't think the model can fit the necessary information into such a small batch size.

fakinated commented 6 years ago

Good to know. However, that's exactly why I was asking the question above.

hoaxherold commented 6 years ago

I have a gtx 1060 6GB card and the maximum batch size I could run training with was 10.

dfaker / df

Out of VRAM #37