chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.49k stars 234 forks source link

Report 503 error on windows server 2016 when parse the file for high concurrency #258

Closed sunweiconfidence closed 4 years ago

sunweiconfidence commented 4 years ago

@chrismattmann i do some performance test for multiple times call parser.frombuffer() method, some gridfs file in mongodb have greater than 50MB, log will print the below information: New var:....... response = b'https://github.com/debezium/debezium\r\nhttps://juejin.im/entry/5c73ed5c51882562d02a03b8' 16:10:28.639346 line 61 parsed = parser.from_buffer(response) New var:....... parsed = {'status': 503}

then this time,service is down, restart the service, then it can work again, i want to know whether it can support high concurrency call this method? or have some setting that can let it don't have 503 error, thanks

sunweiconfidence commented 4 years ago

@chrismattmann fix this issue by upgrade the tika-server.jar to version1.22 and use tika-python1.19 code, previous i use tika-server.jar is version1.19

chrismattmann commented 4 years ago

great, yeah the latest tika I published uses 1.22. Once 1.23 is out I'll upgrade to that too thanks @sunweiconfidence