IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
857 stars 481 forks source link

Solr: Disable Solr Cell/ Tika from solr config #5018

Open kcondon opened 5 years ago

kcondon commented 5 years ago

Requested by a user, recommended by Solr not to run in a prod env, see RT: 266424 Simple solution is to comment out the relevant handler and lib sections of solrconfig.xml

You can experiment with disabling it by editing the solrconfig.xml file and commenting out the following sections: https://lucene.apache.org/solr/guide/7_0/uploading-data-with-solr-cell-using-apache-tika.html

  <requestHandler name="/update/extract"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="fmap.meta">ignored_</str>
      <str name="fmap.content">_text_</str>
    </lst>
  </requestHandler>

and

  <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar" />
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-cell-\d.*\.jar" />
qqmyers commented 5 years ago

@djbrooke - I think this is orthogonal to #5030. I've included the tika lib in Dataverse and it is used there to extract text that is then sent to solr to index. I think you can also send whole docs to solr and have tika run there, which is what I think is not recommended for production.

djbrooke commented 5 years ago

Great, thanks @qqmyers. I was just keying on the mention of Tika without the background knowledge. 👍