Closed fakinated closed 4 years ago
I have a 980ti and I can only do a batch size of 8, any higher and I get memory errors, any lower and I get the same error that you have. I don't think the model can fit the necessary information into such a small batch size.
Good to know. However, that's exactly why I was asking the question above.
I have a gtx 1060 6GB card and the maximum batch size I could run training with was 10.
While running train.py I'm running out of GPU memory. I already tried to set the batch size down to 4 without any improvement. Can you recommend any model parameters to adapt? My GPU is a GTX 970 4GB (of which Tensorflow can only use 3.5GB).
The error is:
OP_REQUIRES failed at transpose_op.cc:199 : Resource exhausted: OOM when allocating tensor with shape[4,16,16,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Allocation Stats:
EDIT: I tried to set
ENCODER_DIM
to 256 and I don't get Memory errors anymore, but now I'm presented with this error: