During first epoch on a quite small dataset(50 MB) on two GPU TITAN X 12GB and I have the following problem after several iterations tensorflow breaks the training process and shows this error:
tensorflow.python.framework.errors_impl.ResourceExhaustedError:
OOM when allocating tensor with shape[128,176,512]
During first epoch on a quite small dataset(50 MB) on two GPU TITAN X 12GB and I have the following problem after several iterations tensorflow breaks the training process and shows this error:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[128,176,512]
[[Node: decode/Training_Decoder/decoder/while/BasicDecoderStep/BahndahauAttention all/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]
(decode/Training_Decoder/decoder/while/BasicDecoderStep/BahndahauAttentionCall/mul/Enter, decode/Training_Decoder/decoder/while/BasicDecoderStep/BahndahauAttentionCall/Tanh)]]
[[Node: decode_1/Inference_Decoder/decoder/while/Identity/_241 = _Recv[client_terminated=false,
recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1,
tensor_name="edge_1390_decode_1/Inference_Decoder/decoder/while/Identity",
tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"]
(^_cloopdecode_1/Inference_Decoder/decoder/while/TensorArrayWrite_1/TensorArrayWriteV3/_11)]]
what can be the reason for that? I've tried smaller batch sizes, but that just delayed the same error.