NASA-PDS / deep-archive

PDS Open Archival Information System (OAIS) utilities, including Submission Information Package (SIP) and Archive Information Package (AIP) generators
https://nasa-pds.github.io/deep-archive/
Other
7 stars 4 forks source link

Update software to only include latest collection in when bundle references LIDs #24

Closed jordanpadams closed 4 years ago

jordanpadams commented 4 years ago

Is your feature request related to a problem? Please describe. Currently, when a bundle only references LIDs, the software looks for all matches for a LID in collection products. We should only grab the latest version.

NOTE 💥 : There should be a flag to ignore this so we can use this software on previous releases of PDS4 data. Something like:

--include-all-collections     For bundles that reference collections by LID, this flag 
                      will include ALL versions of collections in the bundle. By default, 
                      the software only includes the latest version of the collection

Applicable requirements Primary - :unicorn: #50 (see Assumption 3)

nutjob4life commented 4 years ago

Here's where my lack of familiarity with PDS concepts is showing.

Could I see a examples of bundle.xml files that have collection products with LIDs with multiple versions?

jordanpadams commented 4 years ago

@nutjob4life I updated the test data on pds-dev-el7 to now include 2 data collections.

nutjob4life commented 4 years ago

Thanks @jordanpadams! I can log in successfully to pds-dev-el7; point me to a specific file path that exhibits lid-only reference to multiple versions of collections? (Yes, I need my hand held.)

$ find /data -name harvest-2.0.0 -prune -o \( -iname '*bundle*.xml' -print \) 2>/dev/null
/data/home/pds4/insight_cameras/bundle.xml
/data/home/pds4/validate_regression_data/issue_42/V1900/dph_example_archive/bundle_izenberg_pdart14_meap.xml
/data/home/pds4/validate_regression_data/dph_example_archive/bundle_izenberg_pdart14_meap.xml
/data/home/pds4/testdata/dph_example_archive_VG2PLS/bundle_checksums.xml
/data/home/pds4/testdata/dph_example_archive_VG2PLS/bundle.xml
/data/home/pds4/testdata/urn-nasa-pds-kaguya_grs_spectra/bundle_kaguya_derived.xml

I should really read https://pds.nasa.gov/datastandards/documents/current-version.shtml some day, right?

jordanpadams commented 4 years ago

/data/home/pds4/insight_cameras/bundle.xml

I modified the data you were using before to now use LID references and the data collection has multiple versions.

  <Bundle_Member_Entry>
    <lid_reference>urn:nasa:pds:insight_cameras:data</lid_reference> <<<<<<<------
    <member_status>Primary</member_status>
    <reference_type>bundle_has_data_collection</reference_type>
  </Bundle_Member_Entry>

Note: The software should throw a warning or something because there are several collections referenced in the bundle.xml that do not exist

jordanpadams commented 4 years ago

@nutjob4life ☝️

nutjob4life commented 4 years ago

Thanks @jordanpadams. Adding a note to @me:

A LID only reference looks like:

<Bundle_Member_Entry>
  <lid_reference>urn:nasa:pds:whatever</lid_reference>
  …
</Bundle_Member_Entry>

while a full LIDVID reference goes:


<Bundle_Member_Entry>
  <lidvid_reference>urn:nasa:pds:whatever::1.0</lidvid_reference>
  …
</Bundle_Member_Entry>