eihli / image-table-ocr

Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.
MIT License
503 stars 109 forks source link

Traineddata path issue on Windows 10. #14

Open gety9 opened 2 years ago

gety9 commented 2 years ago

When i run

python -m table_ocr.demo https://raw.githubusercontent.com/eihli/image-table-ocr/master/resources/test_data/simple.png

i get

pytesseract.pytesseract.TesseractError: (1, 'Error opening data file C:UsersGetyAppDataLocalProgramsPythonPython38libsite-packagestable_ocrtessdata/table-ocr.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'table-ocr\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

(note file path does not have '/')

File does exist

I tried setting env variable TESSDATA_PREFIX - same error.

as well as specifying path in cli python -m table_ocr.demo https://raw.githubusercontent.com/eihli/image-table-ocr/master/resources/test_data/simple.png --tessdata-dir C:\Users\Btycoon\AppData\Local\Programs\Python\Python38\Lib\site-packages\table_ocr\tessdata

I am on Windows 10.

eihli commented 2 years ago

Sorry to say that I have very little knowledge of Windows 10. I'll leave this comment open for a while in case anyone else has a suggestion.

MikuAuahDark commented 2 years ago

I'd write my workaround.

https://github.com/eihli/image-table-ocr/blob/49205462a3fb68240fd6a3d441ae7cf979b43daa/table_ocr/ocr_image/__init__.py#L30

Replace all backslash to forward slashes. tessdata_dir.replace("\\", "/").

After making the necessary changes, the program works. Windows 10, Python 3.10.4.

MikuAuahDark commented 2 years ago

Also I found out that the built-in Tesseract data that my installation has is more superior compared to the shipped one, so I removed the related line entirely.

GeniusBroccoli commented 2 years ago

Replace all backslash to forward slashes. tessdata_dir.replace("\", "/").

Thank you, I was trying to find a problem all day

eihli commented 1 year ago

If anyone wants to submit a patch to make this more portable across Linux/Windows, please do!

ajay27bhat commented 1 year ago

When i run

python -m table_ocr.demo https://raw.githubusercontent.com/eihli/image-table-ocr/master/resources/test_data/simple.png

i get

pytesseract.pytesseract.TesseractError: (1, 'Error opening data file C:UsersGetyAppDataLocalProgramsPythonPython38libsite-packagestable_ocrtessdata/table-ocr.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'table-ocr\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

(note file path does not have '/')

File does exist

I tried setting env variable TESSDATA_PREFIX - same error.

as well as specifying path in cli python -m table_ocr.demo https://raw.githubusercontent.com/eihli/image-table-ocr/master/resources/test_data/simple.png --tessdata-dir C:\Users\Btycoon\AppData\Local\Programs\Python\Python38\Lib\site-packages\table_ocr\tessdata

I am on Windows 10.

I am also getting same error. Did you solve this problem?

rucxiaowen commented 1 year ago

你好!邮件已收到,谢谢!

ajay27bhat commented 1 year ago

I'd write my workaround.

https://github.com/eihli/image-table-ocr/blob/49205462a3fb68240fd6a3d441ae7cf979b43daa/table_ocr/ocr_image/__init__.py#L30

Replace all backslash to forward slashes. tessdata_dir.replace("\\", "/").

After making the necessary changes, the program works. Windows 10, Python 3.10.4.

How do I run this project after I make changes? I am new to this. So can you please guide on how to run this project? Thanks