adsabs / ADSImportPipeline

Data ingest pipeline for ADS classic->ADS+
GNU General Public License v3.0
1 stars 12 forks source link

Solr_adapter needs to be aware of the "GEO" collection from ADSExports #279

Closed seasidesparrow closed 9 months ago

seasidesparrow commented 9 months ago

We need to update Solr adapter to add the {"GEO": "earth science"} key-value pair that will signal solr to use the Earth Science collection for these records.

The change needs to be made here: https://github.com/adsabs/ADSImportPipeline/blob/a543bf5e41747e239c64b68447bf44f681e5f480/aip/classic/solr_adapter.py#L272

seasidesparrow commented 9 months ago

In MontySolr, queries on the "database" key are searching for a single string, e.g. "database:astronomy". So the "earth science" key should either be "earth_science" or "earthscience". Otherwise Solr may interpret this field as "database:earth".

Solr reads this field and tokenizes the contents as a single entity, so even if this is fielded as "earth science", solr will know how to handle it. On the UI side, the user will need to enclose "earth science" in quotes, but this doesn't have to be done on the import pipeline side. See https://github.com/adsabs/montysolr/blob/master/deploy/adsabs/server/solr/collection1/conf/schema.xml#L857