Closed jeanetteclark closed 1 year ago
In our "big data" scenarios, where a dataset may contain e.g. a million data files, a 'solr' or other query is likely the only way we'll have to get the full list. When we implement support for non-ORE package membership, we should ahve a paging API for gettting package members and their metadata (which also will be used by the UI for paging through dataset display listings). So, TBD but I think you're heading in the right direction here.
a preliminary version is checked in, get_data_pids(identifier, auth_node)
. It's basically just a wrapper around a dataone::query call
This issue makes some assumptions about the model that we are going to use to do data quality checks (namely that we'll be starting from a metadata pid). If we are going to use this model we need a way to get a list of data pids for a given metadata pid.
The easiest option is probably a Solr query, eg:
https://arcticdata.io/metacat/d1/mn/v2/query/solr/?q=id:doi\:10.18739/A2TH8BP4V&fl=documents
Function will take the sysmeta from the metadata doc, use the
pid
andauthoritativeMemberNode
to construct the query, and return a list of pids