SpamScope / spamscope

Fast Advanced Spam Analysis Tool
https://pypi.python.org/pypi/SpamScope
Apache License 2.0
292 stars 59 forks source link

Consider swapping out tika-app with tika-python #9

Closed chrismattmann closed 7 years ago

chrismattmann commented 7 years ago

The Tika Python library uses the REST server (which is faster than CMD line calls in Java to Tika APP since the REST server doesn't need to reload Tika config and the JVM each time). In addition you don't need to worry about the location of the Tika jar file (and install it separately). It will manage all that for you.

Looks like you would just update requirements.txt to use pip install tika, and then make whatever necessary updates. If you want I can send a PR.

fedelemantuano commented 7 years ago

Yes, you right. My first version used Apache Tika with REST: https://github.com/SpamScope/spamscope/commit/e0f580859b8ebee9019b2adc84312cfe8116adce

I replaced but maybe my idea was wrong. So I'm thinking to rollback to REST version and use this docker https://hub.docker.com/r/fmantuano/apache-tika-server/

If you want help me, I will very happy.