chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.51k stars 234 forks source link

Apache Tika : [MainThread ] [WARNI] Tika server returned status: 500 #151

Closed Balachandar-R closed 7 years ago

Balachandar-R commented 7 years ago

Hi team,

When i am trying to extract the content of files(95 PPT files) from a repository, suddenly the Apache tika server throws an warning and the extraction got failed.

May i know how maximum number of files that tika can process it in a single run.?

Thanks Balachandar

chrismattmann commented 7 years ago

it can process easily that many in a single run. My guess is something happened to your Tika server (ps aux | grep java | grep tika yield anything?) If you need to restart tika, kill the server, then restart your python app should work fine after that!

Balachandar-R commented 7 years ago

@chrismattmann Thanks for your reply.. I have killed the running processes and rerun the tika server to extract the contents from 110 files(PPTs) in a single run. But again got the same errors. but when i run it with 40 documents it will be work fine and even works fine for a large PPT's when i run it with single PPT.

2017-09-05 04:44:18,675 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.14/tika-server-1.14.jar to /tmp/tika-server.jar. 2017-09-05 04:44:26,229 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.14/tika-server-1.14.jar.md5 to /tmp/tika-server.jar.md5. 2017-09-05 04:49:56,206 [MainThread ] [WARNI] Tika server returned status: 500 Traceback (most recent call last): File "ustpedia.py", line 196, in content_extraction() File "ustpedia.py", line 155, in content_extraction parsed = parser.from_file(os.path.join(dirpath, f)) File "/usr/local/lib/python2.7/dist-packages/tika/parser.py", line 28, in from_file return _parse(jsonOutput) File "/usr/local/lib/python2.7/dist-packages/tika/parser.py", line 47, in _parse realJson = json.loads(jsonOutput[1]) File "/usr/lib/python2.7/json/init.py", line 339, in loads return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 364, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded

Pls help.

Thanks Balachandar