chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.51k stars 234 forks source link

RuntimeError: Unable to start Tika server. #353

Closed mhrihab closed 1 year ago

mhrihab commented 3 years ago

I created a function that parses a PDF file using TIKA in a service and when I tried to dockerize it, it displays this error : parse_pdf(tmp_path)

File "/app/process.py", line 90, in parse_pdf

data = parser.from_file('document-page' + str(i) + '.pdf', headers=headers)

File "/usr/local/lib/python3.8/site-packages/tika/parser.py", line 40, in from_file

output = parse1(service, filename, serverEndpoint, headers=headers, config_path=config_path, requestOptions=requestOptions)

File "/usr/local/lib/python3.8/site-packages/tika/tika.py", line 336, in parse1

status, response = callServer('put', serverEndpoint, service, f,

File "/usr/local/lib/python3.8/site-packages/tika/tika.py", line 531, in callServer

serverEndpoint = checkTikaServer(scheme, serverHost, port, tikaServerJar, classpath, config_path)

File "/usr/local/lib/python3.8/site-packages/tika/tika.py", line 601, in checkTikaServer

raise RuntimeError("Unable to start Tika server.")

RuntimeError: Unable to start Tika server.

I couldn't fix this error, I am using tika==1.24 and FROM tiangolo/uvicorn-gunicorn-fastapi:python3.9

Horasachy commented 3 years ago

"To use this library, you need to have Java 7+ installed on your system as tika-python starts up the Tika REST server in the background." You need to install java in a container: RUN apt-get install -y default-jdk

chrismattmann commented 1 year ago

correct @Horasachy