Looking forward to your reply

gds101054108 commented 6 years ago

logits = tf.transpose(logits, (1, 0, 2)) why? the original order is [batch time class] self.__seq_len: [self.max_char_count] * self.data_manager.batch_size why? I think seq should be a varient number equal every single target seq length

Belval commented 6 years ago

That's the ordering thattf.nn.ctc_loss needs.

Looking at the documentation, you can use time_major=False and skip the transposition.

See: https://www.tensorflow.org/api_docs/python/tf/nn/ctc_loss

As for the second question I'm not quite sure anymore, I think it had to do with variable width between fonts. Sometimes, fonts are wider than the windows created by the LSTM.

Usually in RNNs, you indeed pass the length of your sequence.

gds101054108 commented 6 years ago

thank you,I have used https://github.com/gds101054108/keras/blob/master/examples/image_ocr.py trained my chinese-ocr model, I have generate 16M variant width image,the height is 48 pixel,Contains gbk totally 21025 characters. it converage to 99.0%,and it works quite well. but I have to use K.clear_session() to kill the session and reload the weight when the batch of image width changed,so it takes one month to trained. I want to use variant rnn to speed up training.Hope you can help me solve this problem.

gds101054108 commented 6 years ago

@Belval I have did some experiment,I think the seq_len is the real image width w//4-1(after CNN) before padding

wangershi commented 6 years ago

@gds101054108 Bingo, I get the same result, seq_len is w//4-1. Besides, I got the method to let seq_len be w//4. In last conv that using valid padding, replacing [2, 1] kernel with [2, 2] kernel make the width not decrease, got the same result with author of the paper.

Belval commented 6 years ago

@wangershi do you mind elaborating? The last convolutional layer (conv7) in crnn.py does use a (2, 2) kernel.

Also, how did you test that your results were the same as the original paper?

wangershi commented 6 years ago

@Belval First question: In original paper, Section 3.2: For example, an image containing 10 characters is typically of size 100*32, from which a feature sequence 25 frames can be generated. I just want to recurrence the code, so this is a view of coding: Beacause the feature width is same before and after conv7, so the stride along horizontal direction is 1, and the padding of conv7 is valid, so if the kenel size along horizontal direction is 2, the width will decrese 1 .it's not the same with the original paper. So the kenel size along horizontal direction is 1. Actually, crnn is writen by lua, I have difficulty in reading it, so I don't know how author solve this(in paper, the kenel size along horizontal direction is 2).

Second question: In crnn.py, if you replaced code self.__seq_len: [self.__max_char_count] * self.__data_manager.batch_size with self.__seq_len: [26] * self.__data_manager.batch_size TensorFlow will raise a Exception: _InvalidArgumentError (see above for traceback): sequencelength(0) <= 25 So, I got how is the seq_len is w//4. It's a pitty that the ctc is confusing, I can't anwser it using ctc algorithm.

Belval commented 6 years ago

@wangershi Thank you for taking the time to explain. I'll make the modifications and retrain to see if it yields better results.

Belval / CRNN

Looking forward to your reply #16