Open jordanpadams opened 1 year ago
@jordanpadams @tloubrieu-jpl for all that I'd much prefer to write a python script and be done with it, isn't this strictly a job for harvest?
Pros: (Python script - should be separate from provenance imho but that's whatever)
Cons:
I suppose the correct solution is to bandaid it with a python script, implement it in harvest as well, then rip off the bandaid once the updated version of harvest is deployed everywhere it needs to be.
Note to self - LIDs are strictly-defined in the PDS Standards Reference as urn:<national_agency>:<archiving_agency>:<bundle>:<?collection>:<?product>
, so it's trivial to split and extract bundle/collection by chunk index.
85ca61e7c7d3464f3b8ae12968ca5a7ff23fac02 implements addition of membership metadata to products whose documents lack such (to prevent having to update every product on every script run)
Metadata is currently written to the document in this format. I'm assuming the nesting isn't a problem but I can tweak it to a flat structure if need be.
All products will have that full membership metadata structure, with null
indicating lack of membership (collections have no collection membership, bundles have neither membership).
Ensuring that this structure is included in the index is an outstanding question. @jordanpadams @jimmie @al-niessner @tloubrieu-jpl would it be appropriate for the script to ensure presence of these fields in the index? I wouldn't think reindexing is necessary in that case as on first run, the index would be added and then the relevant metadata would be written for all products (triggering indexing on each product).
@alexdunnjpl just as an FYI, even though the standard says this:
LIDs are strictly-defined in the PDS Standards Reference
That is not actually always the case. There is an alternate_ids
field that was added to the registry a while back to support backwards compatibility there because there are cases where a new version of a product contained a different LID.
@alexdunnjpl per:
Metadata is currently written to the document in this format. I'm assuming the nesting isn't a problem but I can tweak it to a flat structure if need be.
how would a user then query for that information based upon it's nesting? for other metadata we have added to the registry, we have been flattening it for the time being, e.g. ops:Harvest_Info/ops:archive_status
. We may want to stick to that paradigm for the time being?
@alexdunnjpl per:
for all that I'd much prefer to write a python script and be done with it, isn't this strictly a job for harvest?
similar to the provenance script, we could do this in harvest, but there are a few reasons why we want this in a separate script (e.g. within the provenance script):
Checked for duplicates
Yes - I've already checked
π§βπ¬ User Persona(s)
Data User
πͺ Motivation
...so that I can search for the products of a bundle/collection, and then provide additional filters to the search within the same query.
π Additional Details
Follow-on to https://github.com/NASA-PDS/registry-api/issues/197, we do not want to support
q=
from a/members
or/member-of
endpoint, so we need to some other way to provide this query functionality.Acceptance Criteria
Given When I perform Then I expect
βοΈ Engineering Details
Initial design idea is to update
provenance
script to include adding thecollection_lidvid
andbundle_lidvid
to each product.In order the support the example from #197
the API query would instead be something like: