chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.51k stars 234 forks source link

Tika returns HTTP Error 504: Gateway Time-out #230

Closed alihammadbaig closed 5 years ago

alihammadbaig commented 5 years ago

Tika was working for me until yesterday but today I get 504 error when tika server is being downloaded. I get the error when the PDF is being read parsed = parser.from_file('mypdf.pdf')

Here is the error trace

`2019-06-22 12:57:47,905 [MainThread  ] [INFO ]  Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar to /tmp/tika-server.jar.
2019-06-22 12:58:44,045 [MainThread  ] [INFO ]  Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar.md5 to /tmp/tika-server.jar.md5.
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tika/tika.py in getRemoteJar(urlOrPath, destPath)
    715         try:
--> 716             urlretrieve(urlOrPath, destPath)
    717         except IOError:

20 frames
HTTPError: HTTP Error 504: Gateway Time-out

During handling of the above exception, another exception occurred:

HTTPError                                 Traceback (most recent call last)
/usr/lib/python3.6/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    648 class HTTPDefaultErrorHandler(BaseHandler):
    649     def http_error_default(self, req, fp, code, msg, hdrs):
--> 650         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    651 
    652 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 504: Gateway Time-out`

Any suggestions?

valeriow commented 5 years ago

Change the url to download via the appropriate environment variable:

import os
os.environ['TIKA_SERVER_JAR'] = 'https://repo1.maven.org/maven2/org/apache/tika/tika-server/1.19/tika-server-1.19.jar'
import tika
from tika import parser
alihammadbaig commented 5 years ago

That worked for me. Thank you

talha3111997 commented 5 years ago

Thanks a lot. Also worked for me.