AtlasOfLivingAustralia / biocache-store

Occurrence processing, indexing and batch processing
Other
7 stars 24 forks source link

support docValues in offline indexing #326

Closed adam-collins closed 1 year ago

adam-collins commented 5 years ago

Both biocache-store and biocache-service support the building and use of docvalues SOLR index.

sadeghim commented 5 years ago

Testing it.

ansell commented 5 years ago

Adam's notes on what the steps required for testing/using docvalues are:

Release versions of biocache-store and biocache-service can be used.
A copy of the SOLR Schema with docvalues is available on all bstore nodes, e.g. aws-bstore-1b:/data/solr/biocache/conf_docvalues_new
To override the SOLR Schema used during indexing, uncomment the following line in http://localhost:9193/job/Complete%20Indexing/job/Complete%20Re-index
#cp -r /data/solr/biocache/conf_docvalues_new /data/solr/biocache/conf
Alternatively, add an additional parameter when calling biocache index-local-node in
http://localhost:9193/job/Complete%20Indexing/job/Complete%20Re-index/configure
• New parameter SOLR_CONFIG_XML_PATH defaulting to /data/solr/biocache/conf/solrconfig.xml
• Usage with -sc: biocache index-local-node -sc ${SOLR_CONFIG_XML_PATH}
• To use the docvalues schema: SOLR_CONFIG_XML_PATH=/data/solr/biocache/conf_docvalues_new/solrconfig.xml
Make the following changes to /data/biocache/config/biocache-config.properties to correctly add new fields into the schema at the beginning (new layer fields) and end (new misc fields) on indexing. Please note that new fields will not be added to the schema when SOLR_CONFIG_XML_PATH (/data/solr/biocache/conf/solrconfig.xml) exists.
solr.index.docvalues.layer=true
solr.index.docvalues.misc=true
solr.index.misc=true
solr.index.stored.misc=false
solr.index.stored.layer=false
To prevent SOLR collection alias change use RUN_CHECKS_ONLY=true in http://localhost:9193/job/Complete%20Indexing/job/Update%20colection%20alias/
The above should be sufficient to build a docvalues index with jenkins.
To use the docvalues index update biocache-service-test:/data/biocache/config/biocache-config.properties:solr.collection=docvalues index collection name
To move a successfully tested index into production:
• Set the collection alias in SOLR Cloud.
• Use RUN_CHECKS_ONLY=false in http://localhost:9193/job/Complete%20Indexing/job/Update%20colection%20alias/
• Depending on the method used to select the schema, comment out the copy to /data/solr/biocache/conf in http://localhost:9193/job/Complete%20Indexing/job/Complete%20Re-index or use the default SOLR_CONFIG_XML_PATH
• Ensure bstore nodes have the above changes to /data/biocache/config/biocache-config.properties