chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.49k stars 234 forks source link

Tika server never ends, consuming RAM after usage #235

Closed igormp closed 4 years ago

igormp commented 5 years ago

After processing all the PDFs I needed, I noticed that the Tika server was still up and running in the background, eating a lot of RAM (usually between 1~1.8gb after having processed ~4000 PDFs).

ps -o pid,rss,command 4017   
  PID   RSS COMMAND
 4017 1252128 java -cp /tmp/tika-server.jar org.apache.tika.server.TikaServerCli --port 9998 --host localhost

Is there a way to end the server after doing everything needed, through the library itself? If not, a tika.stopVM() would be something interesting to have.

chrismattmann commented 4 years ago

there isn't a stopVM() right now, but I agree, would be great. If you can send a PR @igormp that would be awesome.

suresh-vy commented 4 years ago

Hi @chrismattmann ,

I'm also facing this issue, It would be great if you can resolve ASAP

Thanks

igormp commented 4 years ago

I created a draft PR that tries to solve that. I'll try to give it a better look and finish it as soon as I can.