chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.51k stars 235 forks source link

How many languages tika-python supports to parse attachments? #291

Closed sunweiconfidence closed 4 years ago

sunweiconfidence commented 4 years ago

hi @chrismattmann

i have a question that How many languages tika-python supports to parse attachments? because i haven't tested all languages for parsing file that contain many languages' content before, thanks.

chrismattmann commented 4 years ago

it supports full unicode/UTF-8 and can extract text in multiple languages see this tutorial though it's specific to Java, note it's exposed by Tika Server, and Tika Python just inherits it.

sunweiconfidence commented 4 years ago

thanks @chrismattmann