Open guptaaman2011 opened 6 years ago
Thank you for using gcv2hocr.
please upload your Capture.jpg.json.
How to use makepdf.sh
You have to edit makepdf.sh before execute. In the first line of makepdf.sh "while [ $a -le 32 ]" this says you have page001.jpg to page032.jpg. You may want to convert different number of jpegs, If you have only one jpeg, You just edit the first line of makepdf.sh as "while [ $a -le 1 ]"
thanks for quick update I am new to ocr technology and just checking the scope of it.Found very interesting
On Fri, Mar 9, 2018 at 5:02 AM, dinosauria123 notifications@github.com wrote:
Thank you for using gcv2hocr.
please upload your Capture.jpg.json.
How to use makepdf.sh
- Go to the same place at makepdf.sh
- Execute " sh ./makepdf.sh "
You have to edit makepdf.sh before execute. In the first line of makepdf.sh "while [ $a -le 32 ]" this says you have page001.jpg to page032.jpg. You may want to convert different number of jpegs, If you have only one jpeg, You just edit the first line of makepdf.sh as "while [ $a -le 1 ]"
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dinosauria123/gcv2hocr/issues/15#issuecomment-371661657, or mute the thread https://github.com/notifications/unsubscribe-auth/AMaNOV0NPxmcJbMwEIdxg6-f54S6Lkutks5tcb-LgaJpZM4SjMqH .
-- https://bottr.me/amangupta577?utm_source=emailSignature
Aman Gupta
@amangupta577 https://bottr.me/amangupta577?utm_source=emailSignature
https://www.facebook.com/app_scoped_user_id/1747714118589975/
Hi dinosauria123 i want to convert hocr format to different format xls,xml,pdf,docx is there any tool or script there.
On Fri, Mar 9, 2018 at 5:05 AM, aman gupta guptaaman702@gmail.com wrote:
thanks for quick update I am new to ocr technology and just checking the scope of it.Found very interesting
On Fri, Mar 9, 2018 at 5:02 AM, dinosauria123 notifications@github.com wrote:
Thank you for using gcv2hocr.
please upload your Capture.jpg.json.
How to use makepdf.sh
- Go to the same place at makepdf.sh
- Execute " sh ./makepdf.sh "
You have to edit makepdf.sh before execute. In the first line of makepdf.sh "while [ $a -le 32 ]" this says you have page001.jpg to page032.jpg. You may want to convert different number of jpegs, If you have only one jpeg, You just edit the first line of makepdf.sh as "while [ $a -le 1 ]"
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dinosauria123/gcv2hocr/issues/15#issuecomment-371661657, or mute the thread https://github.com/notifications/unsubscribe-auth/AMaNOV0NPxmcJbMwEIdxg6-f54S6Lkutks5tcb-LgaJpZM4SjMqH .
-- https://bottr.me/amangupta577?utm_source=emailSignature
Aman Gupta
@amangupta577 https://bottr.me/amangupta577?utm_source=emailSignature
https://www.facebook.com/app_scoped_user_id/1747714118589975/
-- https://bottr.me/amangupta577?utm_source=emailSignature
Aman Gupta
@amangupta577 https://bottr.me/amangupta577?utm_source=emailSignature
https://www.facebook.com/app_scoped_user_id/1747714118589975/
This is what you may want ?
Or this one ?
I dont get it it dont have hocr format in it
On Fri, Mar 9, 2018 at 5:15 AM, dinosauria123 notifications@github.com wrote:
This is what you may want ?
https://www.zotero.org/support/dev/translators
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dinosauria123/gcv2hocr/issues/15#issuecomment-371664344, or mute the thread https://github.com/notifications/unsubscribe-auth/AMaNOWE5yE0UgGHL49Sei6RQWFV557bBks5tccKVgaJpZM4SjMqH .
-- https://bottr.me/amangupta577?utm_source=emailSignature
Aman Gupta
@amangupta577 https://bottr.me/amangupta577?utm_source=emailSignature
https://www.facebook.com/app_scoped_user_id/1747714118589975/
Do you want to convert images to hocr ?
You may use Tesseract OCR.
no i got the hocr format , i see i can convert it to pdf but the challenge now is i want to convert this hocr to different formats like xml,txt,docx,xls extensions .
On Fri, Mar 9, 2018 at 5:22 AM, dinosauria123 notifications@github.com wrote:
Do you want to convert images to hocr ?
You may use Tesseract OCR.
https://github.com/tesseract-ocr/tesseract
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dinosauria123/gcv2hocr/issues/15#issuecomment-371665529, or mute the thread https://github.com/notifications/unsubscribe-auth/AMaNOVZ61Ty1nSMc7JfggjSJoEXPE7Kbks5tccQkgaJpZM4SjMqH .
-- https://bottr.me/amangupta577?utm_source=emailSignature
Aman Gupta
@amangupta577 https://bottr.me/amangupta577?utm_source=emailSignature
https://www.facebook.com/app_scoped_user_id/1747714118589975/
I think you have to use multiple tools. for example, hocr to pdf is possible hocr-tools. https://github.com/tmbdev/hocr-tools#hocr-pdf
pdf may have many tools to convert to other format...
yes i was trying that but after trying to change online recongized pdf into excel format , its saying cant detect the file and not changing to xls so stuck here
On Fri, Mar 9, 2018 at 5:30 AM, dinosauria123 notifications@github.com wrote:
I think you have to use multiple tools. for example, hocr to pdf is possible hocr-tools. https://github.com/tmbdev/hocr-tools#hocr-pdf
pdf may have many tools to convert to other format...
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dinosauria123/gcv2hocr/issues/15#issuecomment-371667106, or mute the thread https://github.com/notifications/unsubscribe-auth/AMaNOZgBUgAG6Y6ONyvZ2RS-hWA0rNdAks5tccYigaJpZM4SjMqH .
-- https://bottr.me/amangupta577?utm_source=emailSignature
Aman Gupta
@amangupta577 https://bottr.me/amangupta577?utm_source=emailSignature
https://www.facebook.com/app_scoped_user_id/1747714118589975/
Do you know Alto ? https://en.wikipedia.org/wiki/ALTO_(XML)
If you want to deal with OCR format, Alto is better than hocr.
Dear User, Your file "scanned.pdf" contains scanned or image textual data. Converting this PDF requires OCR to successfully complete the conversion and retrieve the text. This feature is exclusively available to our Cometdocs Premium Users. Learn more about how to become a premium user here: http://www.cometdocs.com/user/subscriptions Best Regards, Cometdocs Team. Privacy Policy http://www.cometdocs.com/privacy-policy.html 21530700 Ontario Inc 102A-1075 Bay Street, Suite 324, Toronto, ON, M5S 2B2 https://maps.google.com/?q=1075+Bay+Street,+Suite+324,+Toronto,+ON,+M5S+2B2&entry=gmail&source=g
GOT THIS FYI
On Fri, Mar 9, 2018 at 5:53 AM, dinosauria123 notifications@github.com wrote:
More easy ways, Google Drive converts pdf to Excel files.
https://techtites.com/convert-pdf-google-drive/
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dinosauria123/gcv2hocr/issues/15#issuecomment-371671429, or mute the thread https://github.com/notifications/unsubscribe-auth/AMaNOeoy2U06IeqM1YM4R6TuCbcfefDsks5tcctmgaJpZM4SjMqH .
-- https://bottr.me/amangupta577?utm_source=emailSignature
Aman Gupta
@amangupta577 https://bottr.me/amangupta577?utm_source=emailSignature
https://www.facebook.com/app_scoped_user_id/1747714118589975/
I never used this, but I think it is what you want ... https://github.com/tabulapdf/tabula-extractor
I think this topic is not related to gcv2hocr, may I close this issue ?
python gcv2hocr.py Capture.jpg.json > capture.hocr Traceback (most recent call last): File "gcv2hocr.py", line 146, in
page = fromResponse(resp, **args.dict)
File "gcv2hocr.py", line 99, in fromResponse
word.htmlid="word%d%d" % (len(page.content) - 1, len(curline.content))
AttributeError: 'NoneType' object has no attribute 'content'