I am get the following error when trying to run the model in the "train" phase -
2017-05-30 05:39:23,518 root INFO max_gradient_norm: 5.000000
2017-05-30 05:39:23,518 root INFO clip_gradients: True
2017-05-30 05:39:23,518 root INFO valid_target_length inf
2017-05-30 05:39:23,518 root INFO target_vocab_size: 39
2017-05-30 05:39:23,518 root INFO target_embedding_size: 10.000000
2017-05-30 05:39:23,518 root INFO attn_num_hidden: 128
2017-05-30 05:39:23,518 root INFO attn_num_layers: 2
2017-05-30 05:39:23,519 root INFO visualize: True
2017-05-30 05:39:23,519 root INFO buckets
2017-05-30 05:39:23,519 root INFO [(16, 11), (27, 17), (35, 19), (64, 22), (80, 32)]
2017-05-30 05:41:51,137 root INFO Created model with fresh parameters.
Train: : 0%| | 0/156 [00:00<?, ?it/s]2017-05-30 05:46:19,134 root INFO Generating first batch)
E tensorflow/stream_executor/cuda/cuda_blas.cc:472] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
input_tensor dim: (?, 1, 32, ?)
CNN outdim before squeeze: (?, 1, ?, 512)
CNN outdim: (?, ?, 512)
Traceback (most recent call last):
File "src/launcher.py", line 148, in <module>
main(sys.argv[1:], exp_config.ExpConfig)
File "src/launcher.py", line 145, in main
model.launch()
File "/home/sprabh6/Attention-OCR/src/model/model.py", line 300, in launch
summaries, step_loss, step_logits, _ = self.step(encoder_masks, img_data, zero_paddings, decoder_inputs, target_weights, bucket_id, self.forward_only)
File "/home/sprabh6/Attention-OCR/src/model/model.py", line 411, in step
outputs = self.sess.run(output_feed, input_feed)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(64, 522), b.shape=(522, 128), m=64, n=128, k=522
[[Node: model_with_buckets/embedding_attention_decoder_1/attention_decoder/attention_decoder/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](model_with_buckets/embedding_attention_decoder_1/attention_decoder/attention_decoder/concat, embedding_attention_decoder/attention_decoder/weights/read)]]
[[Node: conv_conv5/BatchNorm/AssignMovingAvg/_270 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_28061_conv_conv5/BatchNorm/AssignMovingAvg", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op u'model_with_buckets/embedding_attention_decoder_1/attention_decoder/attention_decoder/MatMul', defined at:
File "src/launcher.py", line 148, in <module>
main(sys.argv[1:], exp_config.ExpConfig)
File "src/launcher.py", line 144, in main
session = sess)
File "/home/sprabh6/Attention-OCR/src/model/model.py", line 151, in __init__
use_gru = use_gru)
File "/home/sprabh6/Attention-OCR/src/model/seq2seq_model.py", line 141, in __init__
softmax_loss_function=softmax_loss_function)
File "/home/sprabh6/Attention-OCR/src/model/seq2seq.py", line 993, in model_with_buckets
decoder_inputs[:int(bucket[1])], int(bucket[0]))
File "/home/sprabh6/Attention-OCR/src/model/seq2seq_model.py", line 140, in <lambda>
self.target_weights, buckets, lambda x, y, z: seq2seq_f(x, y, z, False),
File "/home/sprabh6/Attention-OCR/src/model/seq2seq_model.py", line 122, in seq2seq_f
attn_num_hidden = attn_num_hidden)
File "/home/sprabh6/Attention-OCR/src/model/seq2seq.py", line 675, in embedding_attention_decoder
initial_state_attention=initial_state_attention, attn_num_hidden=attn_num_hidden)
File "/home/sprabh6/Attention-OCR/src/model/seq2seq.py", line 575, in attention_decoder
x = linear([inp] + attns, input_size, True)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 751, in _linear
res = math_ops.matmul(array_ops.concat(args, 1), weights)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1765, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1454, in _mat_mul
transpose_b=transpose_b, name=name)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Blas SGEMM launch failed : a.shape=(64, 522), b.shape=(522, 128), m=64, n=128, k=522
[[Node: model_with_buckets/embedding_attention_decoder_1/attention_decoder/attention_decoder/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](model_with_buckets/embedding_attention_decoder_1/attention_decoder/attention_decoder/concat, embedding_attention_decoder/attention_decoder/weights/read)]]
[[Node: conv_conv5/BatchNorm/AssignMovingAvg/_270 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_28061_conv_conv5/BatchNorm/AssignMovingAvg", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Figured that my LD_LIBRARY_PATH wasn't set properly. So added an entry to make it point to libcublas. Still didn't work. Figured it could be a memory problem. Set GPU options in launcher.py as follows -
I am get the following error when trying to run the model in the "train" phase -
Figured that my LD_LIBRARY_PATH wasn't set properly. So added an entry to make it point to libcublas. Still didn't work. Figured it could be a memory problem. Set GPU options in launcher.py as follows -
Still doesn't work. Can anyone please tell me if I'm missing anything ? Tensorflow version - 1.1.0