Currie32 / Spell-Checker

A seq2seq model that can correct spelling mistakes.
213 stars 93 forks source link

Memory overuse #2

Open IKhaduri opened 6 years ago

IKhaduri commented 6 years ago

During first epoch on a quite small dataset(50 MB) on two GPU TITAN X 12GB and I have the following problem after several iterations tensorflow breaks the training process and shows this error:

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[128,176,512]

[[Node: decode/Training_Decoder/decoder/while/BasicDecoderStep/BahndahauAttention all/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]

(decode/Training_Decoder/decoder/while/BasicDecoderStep/BahndahauAttentionCall/mul/Enter, decode/Training_Decoder/decoder/while/BasicDecoderStep/BahndahauAttentionCall/Tanh)]]

[[Node: decode_1/Inference_Decoder/decoder/while/Identity/_241 = _Recv[client_terminated=false,

recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1,

tensor_name="edge_1390_decode_1/Inference_Decoder/decoder/while/Identity",

tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"]

(^_cloopdecode_1/Inference_Decoder/decoder/while/TensorArrayWrite_1/TensorArrayWriteV3/_11)]]

what can be the reason for that? I've tried smaller batch sizes, but that just delayed the same error.