NCEAS / metadig-rake

MetaDIG rake, a cross-domain QA/QC library
Apache License 2.0
2 stars 0 forks source link

Helper: function to access data pids documented by a metadata pid #11

Closed jeanetteclark closed 1 year ago

jeanetteclark commented 1 year ago

This issue makes some assumptions about the model that we are going to use to do data quality checks (namely that we'll be starting from a metadata pid). If we are going to use this model we need a way to get a list of data pids for a given metadata pid.

The easiest option is probably a Solr query, eg:

https://arcticdata.io/metacat/d1/mn/v2/query/solr/?q=id:doi\:10.18739/A2TH8BP4V&fl=documents

Function will take the sysmeta from the metadata doc, use the pid and authoritativeMemberNode to construct the query, and return a list of pids

mbjones commented 1 year ago

In our "big data" scenarios, where a dataset may contain e.g. a million data files, a 'solr' or other query is likely the only way we'll have to get the full list. When we implement support for non-ORE package membership, we should ahve a paging API for gettting package members and their metadata (which also will be used by the UI for paging through dataset display listings). So, TBD but I think you're heading in the right direction here.

jeanetteclark commented 1 year ago

a preliminary version is checked in, get_data_pids(identifier, auth_node). It's basically just a wrapper around a dataone::query call