chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.51k stars 234 forks source link

LanguageDetectors #336

Closed arky closed 3 years ago

arky commented 3 years ago

@chrismattmann Am trying to test language detection of Apache Tika using your wonderful library. Am not sure the right way to listing the language detectors.

https://tika.apache.org/1.25/api/org/apache/tika/language/detect/LanguageDetector.html#getLanguageDetectors--

And wondering if it possible to override the languages using similar process to https://github.com/chrismattmann/tika-python/wiki/Using-Tika-Translate

chrismattmann commented 3 years ago

Thank you @arky ! I haven't exposed ways to change the language detectors yet in the Tika Python client. Right now it just uses the default as they are set in the upstream Java tika library. I recommend making sure we have the flexibility there to achieve what you are looking for, and then we can propagate it down to the Python client here.