eihli / image-table-ocr

Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.
MIT License
491 stars 108 forks source link

Running issue with simple.png exemple under Win 10 #5

Closed eddydev03 closed 3 years ago

eddydev03 commented 3 years ago

Dear Eihli, Your program will help me in the future for personal porposes. I am running it on Win 10. I foolow all the steps to simply extract datas from images but I don't find why it does not run through it.

Here is the message after I run py -m table_ocr.demo https://raw.githubusercontent.com/eihli/image-table-ocr/master/resources/test_data/simple.png

Running extract_tables.main([C:\Users\MAGICB~1\AppData\Local\Temp\demo_cp3ejb98\simple.png]). Extracted the following tables from the image: [('C:\Users\\AppData\Local\Temp\demo_cp3ejb98\simple.png', ['C:\Users\\AppData\Local\Temp\demo_cp3ejb98\simple\table-000.png'])] Processing tables for C:\Users*\AppData\Local\Temp\demo_cp3ejb98\simple.png. Processing table C:\Users*\AppData\Local\Temp\demo_cp3ejb98\simple\table-000.png. Traceback (most recent call last): File "C:\Users***\AppData\Local\Programs\Python\Python39\lib\site-packages\pytesseract\pytesseract.py", line 255, in run_tesseract proc = subprocess.Popen(cmd_args, subprocess_args()) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 947, in init self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 1416, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\table_ocr\demo__main__.py", line 51, in csv_output = main(sys.argv[1]) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\table_ocr\demo__main__.py", line 32, in main ocr = [ File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\table_ocr\demo__main__.py", line 33, in table_ocr.ocr_image.main(cell, None) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\table_ocr\ocr_image__init.py", line 31, in main txt = ocr_image(cropped, " ".join(tess_args)) File "C:\Users*****\AppData\Local\Programs\Python\Python39\lib\site-packages\table_ocr\ocr_image\init__.py", line 83, in ocr_image return pytesseract.image_to_string( File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\pytesseract\pytesseract.py", line 409, in image_to_string return { File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\pytesseract\pytesseract.py", line 412, in Output.STRING: lambda: run_and_get_output(*args), File "C:\Users***\AppData\Local\Programs\Python\Python39\lib\site-packages\pytesseract\pytesseract.py", line 287, in run_and_get_output run_tesseract(kwargs) File "C:\Users*****\AppData\Local\Programs\Python\Python39\lib\site-packages\pytesseract\pytesseract.py", line 259, in run_tesseract raise TesseractNotFoundError() pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.

I have tesseract installed so I donnot get it: PS C:\Users*\AppData\Local\Programs\Python\Python39> py -m pip install tesseract Requirement already satisfied: tesseract in c:\users*\appdata\local\programs\python\python39\lib\site-packages (0.1.3)

Thanks for your help.

Eddy

eihli commented 3 years ago

The last line of that exception points to line 259 in the file pytesseract/pytesseract.py.

Let's look at that line. https://github.com/madmaze/pytesseract/blob/a98ea7530711ac1319f6504857aa9318d63a2774/pytesseract/pytesseract.py#L256

    try:
        proc = subprocess.Popen(cmd_args, **subprocess_args())
    except OSError as e:
        if e.errno != ENOENT:
            raise e
        raise TesseractNotFoundError()

It's catching an OSError and then throwing a TesseractNotFoundError. It never actually tells us what the OSError is. It is making an assumption that the only OSError that could ever happen is that it can't find Tesseract. Since you say you have Tesseract installed, perhaps there is some other OSError that is being thrown.

You could edit that file C:\Users**\AppData\Local\Programs\Python\Python39\lib\site-packages\pytesseract\pytesseract.py on line 258.5 and add a print(e) to see details about the OSError.

I'll jump ahead to what I expect you'll see if you do that.

This is the code of the entire function where you're getting the error.

def run_tesseract(
    input_filename,
    output_filename_base,
    extension,
    lang,
    config='',
    nice=0,
    timeout=0,
):
    cmd_args = []

    if not sys.platform.startswith('win32') and nice != 0:
        cmd_args += ('nice', '-n', str(nice))

    cmd_args += (tesseract_cmd, input_filename, output_filename_base)

    if lang is not None:
        cmd_args += ('-l', lang)

    if config:
        cmd_args += shlex.split(config)

    if extension and extension not in {'box', 'osd', 'tsv', 'xml'}:
        cmd_args.append(extension)

    try:
        proc = subprocess.Popen(cmd_args, **subprocess_args())
    except OSError as e:
        if e.errno != ENOENT:
            raise e
        raise TesseractNotFoundError()

    with timeout_manager(proc, timeout) as error_string:
        if proc.returncode:
            raise TesseractError(proc.returncode, get_errors(error_string))

You'll see it's running proc = subprocess.Popen(cmd_args, **subprocess_args()). That line is trying to run a command "cmd_args".

What is cmd_args? cmd_args += (tesseract_cmd, input_filename, output_filename_base).

What is tesseract_cmd? tesseract_cmd = 'tesseract'

Try running the command tesseract from your terminal and you'll probably get an error. That will probably be the same error that you're code is throwing, namely, that you don't have tesseract installed.

So why does py -m pip install tesseract show the requirement already satisfied?

Because you have the tesseract python package installed. Which is totally different from the tesseract software. This is the Python package: https://pypi.org/project/tesseract/. This is the software: https://tesseract-ocr.github.io/tessdoc/Downloads.html.

eddydev03 commented 3 years ago

Good evening Eihli, thank you for your quick answer. I appriciate it. So I have tried print (e) on line 258.5 of C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\pytesseract\pytesseract.py but when I run the command it does not show me anything more than the same error. With test, the print runs correctly outside of the try: however not inside. Hence, I can't really know the error still, I supposed. Concerning tesseract, you are completly right. I had the python package installed, but not the software. My question is which path do I install it when I run the .exe? Because pytesseract.py still say that it is still not into the right path. I downloaded tesseract 3.02 (this is the last official version for windows). Do I need to go for the unofficial version 5.0.0?

Sincerely,

eihli commented 3 years ago

I'm not familiar enough with Windows to be of much help with that part.

Doing searches for phrases like "windows pytesseract can't find tesseract exe path" should take you down the correct path.

For example, I found this issue that seems to touch on the issues you're having. https://github.com/maxenxe/HQ-Trivia-Bot/issues/51

It has the following comments:

My path doesn't look like yours @maxenxe I'm on windows 10.
I'm getting the same error and I can't find a clear answer. 
That's the only thing missing for me. It keeps saying tesseract not 
recognized as internal or external. 
Can someone tell me how to add it on PATH on windows 10.
Control Panel > System and Security > System 
> Advanced system settings > Advanced > Environment variables > PATH > New
eddydev03 commented 3 years ago

Hello Eihli, I feel kind of stupid. I had to restart the computer for the PATH to be created. Everything fine now. Thank you for your help. Have a good day.