This project contains the fixings to produce data for a VEuPathDB site search solr core, and to load that data.
All data produced and loaded complies with the VEuPathDB Site Search solr schema.
Its main pieces are:
(Please see those folders for detailed documentation.)
All solr-ready data is dumped, and loaded, in batches. They ensure the validity and trackability of all data in solr. The batches are carefully structured. Each batch:
solr-json-batch_xxxxxxxx_yyyyyyyy_nnnnnnnn
where:
xxxxxxxx
is the batch type (eg, organism)yyyyyyyy
is the batch name (eg, pfal3D7)nnnnnnnn
is a timestamp in seconds since epochzzzzzzzz.json
files (where zzzzzzzz
is a document type name). These files contain documents with the data (such as Genes) to be loaded. Each document has metadata describing the batch it was loaded in (batch type, name and timestamp).batch.json
file. This file has information describing the batch. Its presence in solr indicates that the batch was successfully loaded. Querying solr for these meta documents shows which batches are present in solr.DONE
file