Open hengyu95 opened 3 years ago
Hey @hengyu95 Quick Question! Is this bug still there in gcv2hocr2.py if no, then can you share some code outline or a gist to your edited script. I have updated my own to incorporate many improvement and I am interested in yours too. Share it here so I can improve. :)
I had to manually specify the page_width and page_height to match my PDF images to get the words to align. I am sure the words are perfectly aligned by manually checking the coordinates for each word, but the ocr_lines have coordinates that seem to follow the coordinates of the last word of the previous sentence like so:
I haven't been able to figure out the significance of "baseline", should I be tweaking those to get correct lines?