Closed billqxg closed 5 years ago
For the alignment, we make the following asumption: We assume that the groundtruth for the extual content of the words in the given image is ordered from top to bottom and left to right. With this assumption we can train the network, as it implicitly learns to first focus on the top left word and then on all the other words according to the assumption.
This is by far not ideal, but it works, for now.
Thanks a lot for your reply!
The question is how to align the extracted text regions and the ground-truth. As locations of text is unavailable, how to assign ground-truth label to one extracted text region?