chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.49k stars 234 forks source link

Use google translation #132

Closed sebgoa closed 7 years ago

sebgoa commented 7 years ago

Hi, I am total newbie with tika but I got the basics running properly.

I am curious to know how I could use this client to do a translation that use the Google translate API ?

thanks,

-seb

sebgoa commented 7 years ago

and fwiw, I cannot use the default translation with lingo42

chrismattmann commented 7 years ago

I'll respond shortly sorry. You have to add your config file and custom start tika server with translation property file info

chrismattmann commented 7 years ago

hey @sebgoa so this works fine.

Start the Tika server with Classpath pointing to your language keys

Use Tika-Python

LMC-053601:tika1.15 mattmann$ java -cp ./language-keys:tika-server/target/tika-server-1.15-SNAPSHOT.jar org.apache.tika.server.TikaServerCli
Mar 21, 2017 9:17:19 PM org.apache.tika.parser.image.ImageParser <clinit>
WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
Mar 21, 2017 9:17:19 PM org.apache.tika.server.TikaServerCli main
INFO: Starting Apache Tika 1.15-SNAPSHOT server
Mar 21, 2017 9:17:20 PM org.apache.cxf.endpoint.ServerImpl initDestination
INFO: Setting the server's publish address to be http://localhost:9998/
Mar 21, 2017 9:17:20 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: jetty-8.y.z-SNAPSHOT
Mar 21, 2017 9:17:20 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Started SelectChannelConnector@localhost:9998
Mar 21, 2017 9:17:20 PM org.apache.tika.server.TikaServerCli main
INFO: Started

here is the structure of language-keys:

$ tree language-keys
./language-keys
└── org
    └── apache
        └── tika
            └── language
                └── translate
                    ├── translator.google.properties
                    ├── translator.lingo24.properties
                    └── translator.microsoft.properties

5 directories, 3 files 

Each of the properties files has your associated keys, e.g., as shown here:

$ python2.7
Python 2.7.11 (default, Apr 14 2016, 22:11:07) 
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from tika import translate
>>> translate.from_buffer('bonjour mon ami!', 'fr', 'en')
u'Hello, my friend!'
>>> 
sebgoa commented 7 years ago

Thanks Chris for the detailed answer. FWIW we are packaging Tika as a Kubernetes Chart here: https://github.com/bitnami/charts/tree/tika-server/incubator/tika-server

chrismattmann commented 7 years ago

anytime @sebgoa thank you for packaging Tika! Really appreciate it.

chrismattmann commented 7 years ago

@sebgoa added a link here: http://wiki.apache.org/tika/API%20Bindings%20for%20Tika