chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.49k stars 234 forks source link

md5 file can't be found on airgap setup #308

Closed mike-altonji closed 4 years ago

mike-altonji commented 4 years ago

Received the following error message on my first time running Tika (OS: Windows 10, Set up as "Airgap" with no reliance on internet, using tika-server version 1.24.1) URLError: <urlopen error [WinError 2] The system cannot find the file specified: 'C:\Users\username\Downloads\tika-server-1.24.1.jar.md5'>

I have found in the Apache archives that there used to be .jar.md5 files up through the 1.17 release. However since then, I only see .jar.asc and .jar.sha512 files.

Installation Steps

Then, I ran the following code (hello_tika.png is an image file with the word "hello")

import tika
tika.initVM()
from tika import parser
parsed = parser.from_file('hello_tika.png')

As an output, I received the following:

2020-05-18 14:44:28,505 [MainThread  ] [INFO ]  Retrieving file:////C://Users/username/Downloads/tika-server-1.24.1.jar to C:\Users\username\AppData\Local\Temp\tika-server.jar.
2020-05-18 14:44:29,116 [MainThread  ] [INFO ]  Retrieving file:////C://Users/username/Downloads/tika-server-1.24.1.jar.md5 to C:\Users\username\AppData\Local\Temp\tika-server.jar.md5.

Traceback (most recent call last):

  File "<ipython-input-3-f29a551db210>", line 1, in <module>
    runfile('C:/Users/username/Downloads/tika test.py', wdir='C:/Users/username/Downloads')

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/username/Downloads/tika test.py", line 13, in <module>
    parsed = parser.from_file('hello_tika.png')

  File "C:\ProgramData\Anaconda3\lib\site-packages\tika\parser.py", line 40, in from_file
    output = parse1(service, filename, serverEndpoint, headers=headers, config_path=config_path, requestOptions=requestOptions)

  File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 338, in parse1
    rawResponse=rawResponse, requestOptions=requestOptions)

  File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 531, in callServer
    serverEndpoint = checkTikaServer(scheme, serverHost, port, tikaServerJar, classpath, config_path)

  File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 594, in checkTikaServer
    if not checkJarSig(tikaServerJar, jarPath):

  File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 612, in checkJarSig
    getRemoteJar(tikaServerJar + ".md5", jarPath + ".md5")

  File "C:\ProgramData\Anaconda3\lib\site-packages\tika\tika.py", line 808, in getRemoteJar
    urlretrieve(urlOrPath, destPath)

  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:

  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)

  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)

  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 543, in _open
    '_open', req)

  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)

  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 1451, in file_open
    return self.open_local_file(req)

  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 1490, in open_local_file
    raise URLError(exp)

URLError: <urlopen error [WinError 2] The system cannot find the file specified: 'C:\\Users\\username\\Downloads\\tika-server-1.24.1.jar.md5'>
chrismattmann commented 4 years ago

@mike-altonji what happens if you download the MD5 file yourself? You can find it here. Also check out https://github.com/chrismattmann/tika-python/issues/238 and https://github.com/chrismattmann/tika-python/issues/231

mike-altonji commented 4 years ago

It works when I download the MD5 file directly. Thanks for the link! I was using this (referenced in the package documentation as the source), which didn't have the MD5 file. When I checked the archives, I found MD5 files for all Tika versions up to 1.17, so I just used 1.17. Would you want to update the documentation to refer to the Maven repo? That seems to be the only issue.

chrismattmann commented 4 years ago

@mike-altonji can you submit a PR for the README with how you would update it? I'll review and merge. Thanks! 👍

chrismattmann commented 4 years ago

fixed in #311