4teamwork / ftw.tika

This product integrates Apache Tika for full text indexing with Plone.
4 stars 1 forks source link

Improve performance #9

Closed jone closed 10 years ago

jone commented 10 years ago

The performance isn't that good at the moment.

I've reindexed the searchable text of 2718 files (different sizes and types) in 53 minutes and 8 seconds, which gives an average of 1.1 second per file. I'm not sure how fast the other implementations were, but it feels quite slow.

We should try to optimize this. Maybe running tika in server mode (which could easily be set up as a supervisor program), although it only seems to support HTML..

lukasgraf commented 10 years ago

Another option could be to run Tika with Jnius: http://www.hackzine.org/using-apache-tika-from-python-with-jnius.html

Cython could be a deployment nightmare though.

jone commented 10 years ago

Fixed with #12