NCEAS / metadig-engine

MetaDig Engine: multi-dialect metadata assessment engine
7 stars 5 forks source link

Accumulated stats for historical data #144

Open gothub opened 6 years ago

gothub commented 6 years ago

The MetaDIG Solr engine will be used to generate stats for metadata reports. The current Solr schema doesn't indicate if a PID has been obsoleted by a more recent PID. This makes it difficult/impossible to track how an individual or group of metadata documents has improved over time.

Do we want to include this obsolescense info in the Solr index?

The current Solr index fields are described in this issue.

gothub commented 6 years ago

Note that adding obsolescence info to the index would involve updating the index sub-processor to update the entry for the PID being obsoleted by a new PID.

mbjones commented 6 years ago

As mentioned on a past call, we could also be loading quality stats into our metrics service that Rushi is building, as it has many of the metadata fields needed for faceting and aggregation across versions. That might be better than indexing it all over again. Let's discuss with @rushiraj and @davev.

gothub commented 6 years ago

One consideration for indexing the quality data into it's own index is that the indexing component itself (indexing quality sub processor) is calculating the quality scores from a newly generated quality document. One the scores are calculated in the metadig-engine indexing component, inserting the document into Solr is very fast.