Closed caoyangcr7 closed 5 years ago
N is the number of stn matrix in paper.
Hi,
N
does not change during training. It is set to the maximum amount of text regions you want your network to extract. Naturally, it will happen that some words are shorter than N
. IN this case you must make sure that the these extra timesteps are labelled with the blank
label, so that the network learns to predict the correct number of characters/words in the image.
Let's have a look at your example: Let's assume the 16
is in the top-left corner of the image and the 18
is in the bottom-right corner of the image. Since we always assumed that we read from left to right and top to bottom, we would say that the first label is 16
and the second label is 18
. With this way of defining our labels, we tell the network to put the first prediction always close to the most top-left word and all other predictions following the reading direction.
@Bartzi,thanks for your reply. Maybe I didn`t explain my question 1 clearly. for question 2,I got it. About the N,it seems that you explain it as CTC loss. My question is that ,for example,now the N is set as 3,it means we can get 3 prediction text regions. if the number of ground truth labels is only 2,like “16” and "18" as above,you mean that we must make sure the label of extra prediction region is blank?Am I right?
Yes, that's it!
@Bartzi Thanks a lot ! Hope everything goes well with you !
@Bartzi Sorry to bother you again, I have another 2 questions. first is still about the N, because different training images may have different length of words or characters, so will N change during trainning? When I saw the source code, I found that N was set by num_time_steps param. if N keeps the same during training, so what should we do if N is larger than the length of words or charaters? the second question is about the recognition network,When we get N text regions from the original images after the sample network, how could we find the corresponding label for different text regions during training?for example, we get 2 text regions '16', '18', and we have 2 labels '16', '18',how can we choose label ‘16’ for text regions '16' instead of '18' during the network training? Wish your reply, Thanks.