chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.51k stars 234 forks source link

Use the new RTG Translator to provide tika-translate functionality and set default translation engine to it #339

Closed chrismattmann closed 1 year ago

chrismattmann commented 3 years ago

With the advent of TIKA-3329, we can now have a full translation engine in Tika-Python that supports over 300+ languages to English. Standardize on this. It requires Tika 2.0 though, so in the meanwhile we will have to wait for its release. Or we could backport the class to other versions of Tika.

chrismattmann commented 3 years ago

See output from it:

(base) mattmann@proscuitto:~/git/tika-python$ export TIKA_TRANSLATOR=org.apache.tika.language.translate.RTGTranslator
(base) mattmann@proscuitto:~/git/tika-python$ python
Python 3.7.4 (default, Aug 13 2019, 20:35:49) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tika import translate
>>> translate.from_buffer('Bonjour, mon ami je habitais Los Angeles', 'fr', 'en')
'Hello, my friend I used to live in Los Angeles'
>>> translate.from_buffer('Danke!', 'de', 'en')
'Thank you. Thank you.'
>>> 
chrismattmann commented 3 years ago

The above tested with the Tika 2.0 SNAPSHOT server. I backported the RTG translator to the 1.x branch too and it works fine as well!

(base) mattmann@proscuitto:~/git/tika-python$ python
Python 3.7.4 (default, Aug 13 2019, 20:35:49) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tika import translate
>>> translate.from_buffer('Hola senor, mi yamo es Chris Mattmann', 'es', 'en')
'Hello sir, my name is Chris Mattmann.'
>>> translate.from_buffer('Danke!', 'de', 'en')
'Thank you. Thank you.'
>>> 
thammegowda commented 3 years ago

This is awesome!

bpw1621 commented 2 years ago

Hi @chrismattmann ,

Thanks for the great work: much appreciated.

The example you sketched up top of this thread didn't work for me with the environment variable in-place and running the Docker image https://hub.docker.com/r/tgowda/rtg-model.

Any help appreciated.

chrismattmann commented 1 year ago

Strange. How didn't it work? I'm closing this as fixed since it worked for myself and @thammegowda for now. If you can provide more detail either in this thread or a new PR I will take a look. Thanks.