Closed chenejac closed 3 years ago
Benjamin Gross said:
It seems that Solr will require a solrconfig.xml file regardless of how the core is created. So there may not be a huge advantage to programmatically creating the schema via API if we still need to copy the config. As an interative step, I wrote a python script that, provided a valid Solr URL, will determine the correct solr.home value and copy the vivo-solr conf directory into the right place. Run from the vivo-solr directory.
import requests
import os
from shutil import copytree
core = "vivocore"
solr_url = "http://localhost:8983/solr"
* Ensure core doesn't already exist
r = requests.get(solr_url + "/admin/cores?action=STATUS&core=" + core)
if r.status_code != 200:
raise ValueError("Unable to connect to Solr. Is the solr_url value correct?")
if r.json().get("status", "").get(core):
raise ValueError("A core named \"{}\" already exists!".format(core))
* Determine Solr.home and Solr.data.dir
r = requests.get(solr_url + "/admin/info/system")
solr_home = r.json().get('solr_home')
* Copy files and instaniate core
"""
admin/cores?
action=CREATE&
name=core-name&
instanceDir=path/to/dir&
config=solrconfig.xml&dataDir=data
"""
* Copy the configuration directory into a new directory in Solr.home
instanceDir = solr_home + "/" + core
copytree("vivocore/conf", instanceDir + "/conf")
* Create the new core, using the VIVO configuration we just copied
r = requests.get(solr_url + "/admin/cores?action=CREATE&name=" + core +
"&instanceDir=" + instanceDir +
"&config=solrconfig.xml&dataDir=data")
r = r.json()
if "error" in r:
raise
elif "core" in r:
print('Successfully created core.')
Benjamin Gross said:
Looks like Solr is delivered with some basic configurations located at [solr.home]/server/solr/configsets/_default/conf. So one solution would be to point the new core to one of these default configurations, then tweak any necessary changes using the ConfigAPI which will overlay the .xml file defaults.
Andrew Woods said:
Potentially helpful description of migrating Solr 4 configuration to Solr 7: https://library.brown.edu/DigitalTechnologies/upgrading-from-solr-4-to-solr-7/
Thanks, [~accountid:70121:f6467998-8a46-4ff6-87ab-b06d85463d0a]!
Benjamin Gross said:
At first glance, it seems like not everything in solrconfig.xml can be set via API. Looking at a diff of the default and the config delivered with VIVO there are some changes that can't be done via API, however there is a second example titled "sample_techproducts_configs" that includes almost almost everything in the vivo-solr solrconfig.xml.
What will need to be configured still are the search defaults for a select query, and the etag generation bit.
<requestHandler name="/select" class="solr.SearchHandler">
<!--requestHandler name="search" class="solr.SearchHandler" default="true"-->
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<!-- Copying defaults from the old Vitro's solrconfig -->
<lst name="defaults">
<!-- Adding q.op here -->
<str name="q.op">AND</str>
<str name="defType">edismax</str>
<!-- nameText added for NIHVIVO-3701 -->
<str name="qf">ALLTEXT ALLTEXTUNSTEMMED nameText^2.0 nameUnstemmed^2.0 nameStemmed^2.0 nameLowercase</str>
<str name="echoParams">explicit</str>
<str name="qs">2</str>
<int name="rows">10</int>
<str name="q.alt">*:*</str>
<str name="fl">*,score</str>
<str name="hl">true</str>
<str name="hl.fl">ALLTEXT</str>
<str name="hl.fragsize">160</str>
<str name="hl.simple.pre"><![CDATA[<strong>]]></str>
<str name="hl.simple.post"><![CDATA[</strong>]]></str>
<!-- Default value of mm is 100% which should result in AND behavior, still setting it here
https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser -->
<str name="mm">100%</str>
</requestHandler>
<!-- Update Request Handler.
http://wiki.apache.org/solr/UpdateXmlMessages
The canonical Request Handler for Modifying the Index through
commands specified using XML, JSON, CSV, or JAVABIN
Note: Since solr1.1 requestHandlers requires a valid content
type header if posted in the body. For example, curl now
requires: -H 'Content-type:text/xml; charset=utf-8'
To override the request content type and force a specific
Content-type, use the request parameter:
?update.contentType=text/csv
This handler will pick a response format to match the input
if the 'wt' parameter is not explicit
-->
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<!-- See below for information on defining
updateRequestProcessorChains that can be used by name
on each Update Request
-->
<lst name="defaults">
<str name="update.chain">etag</str>
</lst>
<!--
<lst name="defaults">
<str name="update.chain">dedupe</str>
</lst>
-->
</requestHandler>
<!-- ETag generation
Creates the "etag" field on the fly based on a hash of all other
fields.
-->
<updateRequestProcessorChain name="etag">
<processor class="solr.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<str name="signatureField">etag</str>
<bool name="overwriteDupes">false</bool>
<str name="signatureClass">solr.processor.Lookup3Signature</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Benjamin Gross said:
According to the documentation Solr's ConfigAPI does not support updateRequestProcessorChain which we use for creating etags. We can create the etag processor, not sure if the other parts of the processor chain (solr.LogUpdateProcessorFactory and solr.RunUpdateProcessorFactory) will happen automatically if we set up 'etag' to be a default processor for any updates...
Benjamin Gross said:
Another problem... The Solr Schema API does not allow us to set the uniqueKey, and for some reason VIVO uses a custom 'DocID' field instead of the default 'id'. [https://issues.apache.org/jira/browse/SOLR-7242]
Vitro will have to be modified to use the default Solr id. Doesn't seem to rough: https://github.com/vivo-project/Vitro/search?q=docid
Benjamin Gross said:
As discussed during the dev meeting today ([https://wiki.lyrasis.org/display/VIVO/2020-10-06+-+VIVO+Development+IG)] we will likely take a first step by VIVO or Vitro copying the solrconfig.xml and schema.xml files into the right spot and creating the core via API using those files.
Andrew Woods said:
Notes, Using LukeRequestHandler (https://cwiki.apache.org/confluence/display/SOLR/LukeRequestHandler):
Benjamin Gross said:
Question that came up during the call today... if a user has an existing hardened Solr installation, will VIVO be able to determine the location of Solr.home? How about if they install via using this script? https://lucene.apache.org/solr/guide/8_0/taking-solr-to-production.html#taking-solr-to-production
Andrew Woods said:
Closing this ticket due to:
The solution here is to stay with the pattern of suggesting sysadmins configure VIVO's Solr by copying the configuration found in https://github.com/vivo-project/vivo-solr per the associated instructions.
We may revisit this ticket in the context of supporting Solr-Cloud, which supports a more complete API.
Benjamin Gross (Migrated from VIVO-1752) said:
Solr cores can be created and configured via RESTful API calls. Documentation is here: [https://lucene.apache.org/solr/guide/7_3/coreadmin-api.html#coreadmin-api]. Document the collection of calls that would replicate the vivocore directory currently provided at [https://github.com/vivo-community/vivo-solr/tree/vivo-solr-1.11.0/vivocore].
Advantages of this: