Closed aweingarten closed 8 years ago
Tika is actually installed as part of Solr itself, though it's slightly non-obvious.
For my own purposes, when I need to use Tika on the server, I add a tika requestHandler to my solrconfig.xml
file like so:
<!-- For Apache Solr and Search API Attachments modules -->
<requestHandler name="/extract/tika"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
</lst>
<!-- This path only extracts - never updates -->
<lst name="invariants">
<bool name="extractOnly">true</bool>
</lst>
</requestHandler>
Then you can set the path in the Solr Attachments setting to /extract/tika and use that handler. Newer versions of the solr config that comes with Drupal's Solr modules may already have a handler defined that may work (like /update/extract
).
See further:
Ah, one other thing I forgot to mention; you have to point Solr to the proper jar file for extraction too, so where other <lib>
s are defined in solrconfig.xml
, add the following (if using the normal/default settings for the geerlingguy.solr
role):
<lib dir="/opt/solr/dist" regex="apache-solr-cell-\d.*\.jar" />
<lib dir="/opt/solr/contrib/extraction/lib" regex=".*\.jar" />
@geerlingguy, I have a project where I need to use Tika to index attachments. I was wondering what is the best way to install it into DrupalVM. Didn't see anything in the Ansible roles. Is the recommended way to create a script like for "configure-solr"? Is there such a setup script already floating around?