chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.51k stars 234 forks source link

Block's port of web-service when using tika (detector) #74

Closed Purg closed 8 years ago

Purg commented 8 years ago

After running an app (flask) that's communicating via some port, if I close down that app and relaunch it I will consistently get a "port in use" error when starting flask up from the second time on-wards. After spending some time in htop killing potential java processes (lsof -i reported that only firefox/mongod/some-java-thing was using port sockets). Killing specifically the tika-server.jar process allowed me to run flask again on the original port. Any ideas?

chrismattmann commented 8 years ago

hey @Purg so tika-python starts the REST server in the background on the default port (8889). having it run in the background optimizes it and makes it load faster, etc. Are you saying that you don't want it to be resident in the background when your flask app (which I presume uses tika-python), stops?

Purg commented 8 years ago

I don't think that the REST server running in the background is specifically the issue, because its always done this and I haven't had this issue until recently. However, though I don't have a specific use-case in mind, the ability to close/shutdown the server on process end could be useful in the future for processes that don't want to leave side-effects / resources open (i.e. could be lauched as a subprocess globally upon first tika function call and register a shutdown hook with the atexit module).

What I'm seeing, specifically, is:

chrismattmann commented 8 years ago

hey @Purg the exist python and flask server part is getting me - perhaps this is an issue with a child/orphaned process not giving up its resources since tika-server is still running? It is started in the bg I believe, so I'm wondering about this, but that seems like the culprit. Have you ran lsof around and tried to see if it's holding on?

Purg commented 8 years ago

I have done lsof and I believe it said that a java process was holding it (tika was the only java thing running). I'll want to double check that again, but I'm in the middle of other stuff at the moment. I'll get back to it when I iterate on SMQTK IQR components again (what I was doing when I discovered this).

chrismattmann commented 8 years ago

@Purg if you can give me a test case to reproduce I'll try and investigate. Otherwise I'm going to close this for now as I can't seem to reproduce it myself.