githubharald / SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.
https://towardsdatascience.com/2326a3487cd5
MIT License
1.99k stars 893 forks source link

InvalidArgumentError: Labels length is zero in batch in ctc_loss function #32

Closed Ammiit closed 5 years ago

Ammiit commented 5 years ago

I am getting this error while training the model for line by line handwritten text recognition after training with some batches.

2018-12-10 15:15:10.154857: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at ctc_lossop.cc:166 : Invalid argument: Labels length is zero in batch 37 Traceback (most recent call last): File "main.py", line 143, in main() File "main.py", line 130, in main train(model, loader) File "main.py", line 34, in train loss = model.trainBatch(batch) File "/home/dell/FAQ/SimpleHTR/src/Model.py", line 215, in trainBatch (, lossVal) = self.sess.run([self.optimizer, self.loss], { self.inputImgs : batch.imgs, self.gtTexts : sparse , self.seqLen : [Model.maxTextLen] * Model.batchSize, self.learningRate : rate} ) File "/home/dell/FAQ/virenv_scrap/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 900, in run run_metadata_ptr) File "/home/dell/FAQ/virenv_scrap/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "/home/dell/FAQ/virenv_scrap/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run run_metadata) File "/home/dell/FAQ/virenv_scrap/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Labels length is zero in batch 37 [[Node: CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](transpose, _arg_Placeholder_1_0_1, _arg_Placeholder_2_0_2, _arg_Placeholder_4_0_4)]]

Caused by op u'CTCLoss', defined at: File "main.py", line 143, in main() File "main.py", line 129, in main model = Model(loader.charList, decoderType) File "/home/dell/FAQ/SimpleHTR/src/Model.py", line 34, in init (self.loss, self.decoder) = self.setupCTC(rnnOut3d) File "/home/dell/FAQ/SimpleHTR/src/Model.py", line 104, in setupCTC loss = tf.nn.ctc_loss(labels=self.gtTexts, inputs=ctcIn3dTBC, ignore_longer_outputs_than_inputs=True,sequence_length=self.seqLen, ctc_merge_repeated=True) File "/home/dell/FAQ/virenv_scrap/local/lib/python2.7/site-packages/tensorflow/python/ops/ctc_ops.py", line 158, in ctc_loss ignore_longer_outputs_than_inputs=ignore_longer_outputs_than_inputs) File "/home/dell/FAQ/virenv_scrap/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_ctc_ops.py", line 285, in ctc_loss name=name) File "/home/dell/FAQ/virenv_scrap/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/dell/FAQ/virenv_scrap/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op op_def=op_def) File "/home/dell/FAQ/virenv_scrap/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1718, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

githubharald commented 5 years ago

Do you use the IAM dataset or some other dataset? It seems like one of the ground-truth-texts has length 0, which can't be handled by the ctc_loss TensorFlow implementation.

To find out if this is the case, add the following code after this code line:

if gtText == '':
    print('Found sample with empty text:', line)
    continue

This should print and ignore the samples with empty texts.

If this is not the case, then I need more information - please fill out the issue template.

Ammiit commented 5 years ago

Thanks it solved the error I was using RIMES dataset.