Open heroturtle opened 6 years ago
Thank you for using gcv2hocr. Please upload your json output file, I will check it.
Thanks for the quick reply. I used this Response from https://cloud.google.com/vision/docs/ocr: test2.jpg.json.zip
Thank you for upload your file. I could convert it to hocr file using by C version of gcv2hocr. I confirmed the conversion fails in the case of Python version. I will fix Python version. Sorry for inconvenience.
I have modified gcv2hocr.py. I hope this fix the issue.
Thanks for the prompt fix. It works now. May I ask you: 1) DOCUMENT_TEXT_DETECTION doesn't work yet I assume 2) I assume that for line_detection, the image needs to be deskewed. In the test sample, it worked but not in the sample I provided. In addition, the output for C and Python is slightly different. Thanks for your work.
I think DOCUMENT_TEXT_DETECTION supports some language (English, etc.) but not for all.
The image needs to be deskewed to get good recognition result. But I think it maybe done by the other application or command, doesn't for a part of gcv2hocr.
The output for C and Python is different. Historically, Python version is not committed by me. Python output is better than C output in the view of the hocr format (text structure). But Python output fails to place characters in the Japanese vertical text (I made gcv2hocr for this purpose), because ReportLab (this generate pdf output) does not support Japanese vertical text. So, in the case of C output, CR/LF is added every single word (characters) to save the position in the Japanese vertical text...
I tried to convert the json output on Google's page using gcv2hocr.py: https://cloud.google.com/vision/docs/ocr Traceback (most recent call last): File "gcv2hocr2.py", line 146, in
page = fromResponse(resp, **args.dict)
File "gcv2hocr.py", line 99, in fromResponse
word.htmlid="word%d%d" % (len(page.content) - 1, len(curline.content))
AttributeError: 'NoneType' object has no attribute 'content'
Thanks