adsabs / ADSImportPipeline

Data ingest pipeline for ADS classic->ADS+
GNU General Public License v3.0
1 stars 12 forks source link

Update data field in solr document #164

Closed aaccomazzi closed 6 years ago

aaccomazzi commented 6 years ago

Right now the data field in the solr document contains a list of acronyms corresponding to data archives, e.g.

"data": [ "CXO", "XMM", "MAST" ]

We want to change this to add a count next to each archive, i.e.

"data": [ "CXO:1", "XMM:5", "MAST:15" ]

This data should be generated by the non-bib pipeline based on tables read from classic. When indexing this field, the tokenizer should store the values as they are but index only the string preceding the colon, so that searches for data:CXO will work (but not data:"CXO:1").