axa-group / Parsr

Transforms PDF, Documents and Images into Enriched Structured Data
Apache License 2.0
5.72k stars 304 forks source link

TableDetection2 Fails - Can't Find Java #677

Open Sohex opened 2 months ago

Sohex commented 2 months ago

The TableDetection2 script in the latest docker image fails due to being unable to locate java:

executing command: python3 /opt/app-root/src/dist/assets/TableDetection2Script.py /tmp/ee8eb2dde2a5d2305f620fd762c4a1.pdf all

Check icon

executing command error: Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 85, in _run
    check=True,
  File "/usr/lib/python3.7/subprocess.py", line 472, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 212, in 
    main()
  File "/opt/app-root/src/dist/assets/TableDetection2Script.py", line 188, in main
    tables2 = tabula.read_pdf(pdf_file, stream=True, pages='all', output_format="json")
  File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 322, in read_pdf
    output = _run(java_options, kwargs, path, encoding)
  File "/usr/local/lib/python3.7/dist-packages/tabula/io.py", line 91, in _run
    raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)
tabula.errors.JavaNotFoundError: `java` command is not found from this Python process.Please ensure Java is installed and PATH is set for `java`
luqiudi commented 1 month ago

me too....