chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.51k stars 235 forks source link

Issue downloading tika-server jar #221

Closed kylefoley76 closed 5 years ago

kylefoley76 commented 5 years ago

I'm using Tika to read pdf documents. It's not working. The message I'm getting is

2019-02-18 00:09:22,646 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar to /var/folders/vd/5ccxv4957f1_prjqt1l_ppsw0000gq/T/tika-server.jar.

And then it never goes to the next line of code.

One thing that concerns me is that it says tika-server 1.19 but I can't download 1.19. This is what happened when I tried to download 1.19

Admins-MacBook-Pro-4:proofs kylefoley$ pip install tika==1.19 Collecting tika==1.19 Requirement already satisfied: requests in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from tika==1.19) (2.19.1) Requirement already satisfied: setuptools in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from tika==1.19) (28.8.0) Requirement already satisfied: certifi>=2017.4.17 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from requests->tika==1.19) (2018.11.29) Requirement already satisfied: idna<2.8,>=2.5 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from requests->tika==1.19) (2.7) Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from requests->tika==1.19) (3.0.4) Requirement already satisfied: urllib3<1.24,>=1.21.1 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (from requests->tika==1.19) (1.23) Installing collected packages: tika Found existing installation: tika 1.18 Uninstalling tika-1.18: Successfully uninstalled tika-1.18 Successfully installed tika-1.18

Maybe this is because I'm using Python 3.6 if so then I cannot upgrade to 3.7 because I do not understand python sufficiently well to handle any problems that might arise from that move.

The actual code I'm using for Tika is:

result = unpack.from_file(file)

vgalisson commented 5 years ago

Same here, it seems like there is an issue dowloading tika-server from maven. I tried to see where the link goes (http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar) and it starts downloading but is blocked at the start and never finish (like shown below) image

hiSingh248 commented 5 years ago

use brew install curl to install curl on your system which resolves the download issue from maven and gives the parsed content.

chrismattmann commented 5 years ago

resolved per @hiSingh248