chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.49k stars 234 forks source link

No server start error checking #113

Closed Purg closed 6 years ago

Purg commented 8 years ago

If attempted on a system without java, or if the server failed to start for whatever reason, no error is raised until an action is attempted (checkTikaServer does not fail if the server fails to start).

Also, there is a sleep in startServer. Should make some actual calls to the server to check that its running correctly, i.e. spin-loop a request until it succeeds, or time/max-retry out.

Purg commented 8 years ago

Also, either the default value for checkTikaServer port (Port) needs to be an integer (https://github.com/chrismattmann/tika-python/blob/master/tika/tika.py#L101), or the command template (https://github.com/chrismattmann/tika-python/blob/master/tika/tika.py#L351) needs to be able to take a string (%i -> %s).

Purg commented 8 years ago

Primary reason for this issue: system does not have java installed, e.g. a new docker instance or ubuntu/centos/etc.

chrismattmann commented 8 years ago

Thanks @Purg will check it

chrismattmann commented 8 years ago

hmm I like the idea of spin loop and check that a call succeeds, but it will introduce some minor but acceptable overhead. I'll implement that @Purg thanks for the report.

mjbommar commented 6 years ago

What about a post-install hook to check Java using setuptools.command.install in setup.py? This issue has also affected some of our clients and a pip failure might be appropriate here given the complete dependency on java.

https://setuptools.readthedocs.io/en/latest/setuptools.html?highlight=setuptools.command.install#adding-commands

Purg commented 6 years ago

A combination of both may be the most robust. Since java is completely detached from python and this module, java can disappear while this module sticks around in a python install tree.

mjbommar commented 6 years ago

happy saturday, @Purg and @chrismattmann!

i took a stab at this, adding:

you can find it in my feature branch here: https://github.com/mjbommar/tika-python/tree/feature-check-java-exists

commits here: https://github.com/mjbommar/tika-python/commit/3ca6c2b144a54fa4531b9e048fcf8041ab2f4fb8

apologies for pycharm's aggressive reformatting, but the real changes should be apparent in the constants, startServer, and checkTikaServer

if one of you would like to review and test, i can fix the cosmits and PR with just the relevant lines.

mjbommar@DESKTOP C:\Users\mjbommar\PycharmProjects\tika-python
$ set TIKA_JAVA=java11

mjbommar@DESKTOP C:\Users\mjbommar\PycharmProjects\tika-python
$ ipython
Python 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 6.3.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import tika.language

In [2]: tika.language.from_buffer("This is definitely English")
2018-06-30 09:08:16,077 [MainThread  ] [ERROR]  Unable to run java; is it installed?
2018-06-30 09:08:16,079 [MainThread  ] [ERROR]  Failed to receive startup confirmation from startServer.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-dce7274abac5> in <module>()
----> 1 tika.language.from_buffer("This is definitely English")

~\PycharmProjects\tika-python\tika\language.py in from_buffer(string)
     35     '''
     36     status, response = callServer('put', ServerEndpoint, '/language/string', string,
---> 37                                   {'Accept': 'text/plain'}, False)
     38     return response

~\PycharmProjects\tika-python\tika\tika.py in callServer(verb, serverEndpoint, service, data, headers, verbose, tikaServerJar, httpVerbs, classpath, rawResponse)
    533     global TikaClientOnly
    534     if not TikaClientOnly:
--> 535         serverEndpoint = checkTikaServer(scheme, serverHost, port, tikaServerJar, classpath)
    536
    537     serviceUrl = serverEndpoint + service

~\PycharmProjects\tika-python\tika\tika.py in checkTikaServer(scheme, serverHost, port, tikaServerJar, classpath)
    591             if not status:
    592                 log.error("Failed to receive startup confirmation from startServer.")
--> 593                 raise RuntimeError("Unable to start Tika server.")
    594     return serverEndpoint
    595

RuntimeError: Unable to start Tika server.
chrismattmann commented 6 years ago

@mjbommar I’d be happy to review and yes please clean up and submit your PR with only the relevant lines. We should also include a README.md update in your PR with the new env vars

mjbommar commented 6 years ago

Just PR'd

On Sat, Jun 30, 2018, 10:27 Chris Mattmann notifications@github.com wrote:

@mjbommar https://github.com/mjbommar I’d be happy to review and yes please clean up and submit your PR with only the relevant lines. We should also include a README.md update in your PR with the new env vars

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/chrismattmann/tika-python/issues/113#issuecomment-401544572, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFMPfJHCB6hiUT5gYYYM1YBqxuHFeC9ks5uB4rogaJpZM4I6IAT .

chrismattmann commented 6 years ago

fixed in https://github.com/chrismattmann/tika-python/commit/da7bbbdfdd97a9c0376666d215bf6b69b7ea34e7

poojakhatri commented 4 years ago

Insert this command in middle: tika.initVM() import tika tika.initVM() from tika import parser