chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.51k stars 234 forks source link

Error when running Windows #44

Closed dongnizh closed 9 years ago

dongnizh commented 9 years ago

@chrismattmann When I am running tika-python by "parsing a file" on Windows (actually a virtual windows), it shows like: a6781832-2fb4-495a-a926-c2dea2467c1d However, when you run cmd like "python tika.py config mime-types", it is working. This is one link I found so far on this problem: https://github.com/kennethreitz/requests/issues/2364

Please have a look.

chrismattmann commented 9 years ago

thanks for filing this @dongnizh ! how are you doing in Seattle?

dongnizh commented 9 years ago

Hi, @chrismattmann Hope you are doing all good. Today is my first day at work.It is kind of different from what I did at school and I still need much more time to adapt to the new environment. ^_^

chrismattmann commented 9 years ago

hang in there and keep me posted @dongnizh

dongnizh commented 9 years ago

@chrismattmann Of course!!

chrismattmann commented 9 years ago

@dongnizh are we still seeing this error?

chrismattmann commented 9 years ago

More info on this, someone reported it to the Python httplib: https://bugs.python.org/issue23054 This is still open as of December 2014.

chrismattmann commented 9 years ago

overall this issue has to do with a PUT request being made in Windows. Since Tika Server uses PUT requests like everywhere this is causing the issue, only on Windows.

chrismattmann commented 9 years ago

See my comment on: http://bugs.python.org/issue23054

chrismattmann commented 9 years ago

See: http://bugs.python.org/issue8450

dongnizh commented 9 years ago

Hi, @chrismattmann will look into this.

chrismattmann commented 9 years ago

Thanks @dongnizh !

chrismattmann commented 9 years ago

Hey @dongnizh I think this error has to do with the fact that you have a bogus tika-server.jar in your temp folder. And for whatever reason on Windows it doesn't seem to remove the C:\Users\appdata\Local\Temp directory when you restart. If tika-server was downloaded incorrectly then it will remain there. If you delete that jar file, it will redownload it. However see two related problems in #54 that are affecting windows use.

chrismattmann commented 9 years ago

See my fix for #54 and #56 @dongnizh let me know if that fixes this. We could probably add some more robust code here in #44 to verify the downloaded tika-server jar against its sha1. For example, we could check in getRemoteFile if there is a corresponding .md5 file for the URL, if so, we could then test if the download was successful. For example, https://repo1.maven.org/maven2/org/apache/tika/tika-server/1.9/tika-server-1.9.jar.md5 exists, and we could then test it.

dongnizh commented 9 years ago

@chrismattmann Thanks for your update. Will try the new version on Windows and update the result later.

chrismattmann commented 9 years ago

Btw I implemented the md5 check