Running Tika as a server is much faster because the JVM is no longer booted for every file / conversion.
Enhancements
This pull request adds support for the Tika daemon. The meta-directive is extended to also accept the host and port options. When configured, ftw.tika will automatically switch to daemon mode and contact the server with the configured port (/ host).
If the server is not running, ftw.tika will automatically fall back to to executing Tika directly using the configured path to the jar file.
Productive installation
ftw-buildouts provides a tika-server.cfg that can be used when the deployment buildout is based on ftw.buildout's deployment.cfg. The tika-server.cfg downloads Tika, creates a server-script registered in supervisor and configures ftw.tika (ZCML).
More details about how to install it using buildout are described in the updated readme.
Performance Test
I've created 100 docx-files in Plone with random content and length and updated the SearchableText, once with the "old" method by firing up Tika for every file and once by using a Tika server.
The results:
Method
Duration for 100 files
Duration per file
Non-Daemon
110 seconds
1.1 seconds
Daemon
6.72 seconds
0.0672 seconds
@lukasgraf can you take a look at my changes?
/cc @maethu
Running Tika as a server is much faster because the JVM is no longer booted for every file / conversion.
Enhancements
This pull request adds support for the Tika daemon. The meta-directive is extended to also accept the
host
andport
options. When configured,ftw.tika
will automatically switch to daemon mode and contact the server with the configured port (/ host).If the server is not running,
ftw.tika
will automatically fall back to to executing Tika directly using the configuredpath
to the jar file.Productive installation
ftw-buildouts provides a
tika-server.cfg
that can be used when the deployment buildout is based on ftw.buildout'sdeployment.cfg
. Thetika-server.cfg
downloads Tika, creates a server-script registered in supervisor and configuresftw.tika
(ZCML).More details about how to install it using buildout are described in the updated readme.
Performance Test
I've created 100 docx-files in Plone with random content and length and updated the
SearchableText
, once with the "old" method by firing up Tika for every file and once by using a Tika server.The results:
@lukasgraf can you take a look at my changes? /cc @maethu