geerlingguy / drupal-vm

A VM for Drupal development
https://www.drupalvm.com/
MIT License
1.37k stars 647 forks source link

Installing with Solr Tika For Indexing Attachements #521

Closed aweingarten closed 8 years ago

aweingarten commented 8 years ago

@geerlingguy, I have a project where I need to use Tika to index attachments. I was wondering what is the best way to install it into DrupalVM. Didn't see anything in the Ansible roles. Is the recommended way to create a script like for "configure-solr"? Is there such a setup script already floating around?

geerlingguy commented 8 years ago

Tika is actually installed as part of Solr itself, though it's slightly non-obvious.

For my own purposes, when I need to use Tika on the server, I add a tika requestHandler to my solrconfig.xml file like so:

  <!-- For Apache Solr and Search API Attachments modules -->
  <requestHandler name="/extract/tika"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
    </lst>
    <!-- This path only extracts - never updates -->
    <lst name="invariants">
      <bool name="extractOnly">true</bool>
    </lst>
  </requestHandler>

Then you can set the path in the Solr Attachments setting to /extract/tika and use that handler. Newer versions of the solr config that comes with Drupal's Solr modules may already have a handler defined that may work (like /update/extract).

See further:

geerlingguy commented 8 years ago

Ah, one other thing I forgot to mention; you have to point Solr to the proper jar file for extraction too, so where other <lib>s are defined in solrconfig.xml, add the following (if using the normal/default settings for the geerlingguy.solr role):

  <lib dir="/opt/solr/dist" regex="apache-solr-cell-\d.*\.jar" />
  <lib dir="/opt/solr/contrib/extraction/lib" regex=".*\.jar" />