Belval / CRNN

A TensorFlow implementation of https://github.com/bgshih/crnn
MIT License
299 stars 101 forks source link

merge_repeated should be set False #25

Closed wangershi closed 6 years ago

wangershi commented 6 years ago

Hi, recently I studies the source code of TensorFlow, mainly CTC decoder, I found that the argument merge_repeated should be set False. According to https://www.tensorflow.org/api_docs/python/tf/nn/ctc_beam_search_decoder, I think the top path is the sequence we needed, not the one after merge. And the source code is LabelSeq() in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/util/ctc/ctc_beam_entry.h: if (!merge_repeated || c->label != prev_label) { labels.push_back(c->label); } This labels should be sequence after many-to-one map, otherwise the blank should be judged. I have tested this in my own project, there is a surprising increase of accuracy after I change the merge_repeated from True to False, and I watched the sequence generated by beam search, all the sequences(merge_repeated=True or False) don't contain the blank. Sorry I haven't test this property in this project, the environment is unique. Besides, I have fount the same setting in https://github.com/synckey/tensorflow_lstm_ctc_ocr/blob/master/lstm_and_ctc_ocr_train.py.

Belval commented 6 years ago

This was requested multiple time, thank you.