NASA-PDS / deep-archive

PDS Open Archival Information System (OAIS) utilities, including Submission Information Package (SIP) and Archive Information Package (AIP) generators
https://nasa-pds.github.io/deep-archive/
Other
7 stars 4 forks source link

Improved handling with Registry file refs that don't share a common prefix with the bundle URL #184

Closed nutjob4life closed 1 month ago

nutjob4life commented 1 month ago

🗒️ Summary

Merge this if you dare to fix #178. This alters the way paths are written into checksum manifests and transfer manifests. Previously, we'd just find the index of the rightmost / in the bundle URL and assume that we could then strip that many characters off the beginning of all file refs associated with that bundle.

This wasn't the case for urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023::1.0, though. It had file refs that were in different parent paths altogether. In fact, 96 of the file refs shared the path as the bundle.xml, but 86 were in different parents.

Now, we find the prefix not based on the index of the rightmost / but by scanning all file refs and looking for the largest common prefix amongst them—stripping those out before writing the manifests.

⚙️ Test Data and/or Report

Well, the built-in hooks confirmed everything's up-to-snuff, but to be extra explicit:

$ pds-deep-registry-archive --quiet --site PDS_SBN urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023::1.0
$ head -5 cassini_uvis_solarocc_beckerjarmak2023_v1.0_*_checksum_manifest_v1.0.tab 
92e11c0d9b513fe1d0794feebd47a9cc    cassini_uvis_solarocc_beckerjarmak2023_v1.0/bundle.xml
f04bc73d543335c215c5fd0e30c2bee7    cassini_uvis_solarocc_beckerjarmak2023_v1.0/readme.txt
ed1ee9a42ed9c9d8e6fd41f2f1550eb5    cassini_uvis_solarocc_beckerjarmak2023/data/collection_data.csv
ea7f61d84acb132a591177b7c2e284b6    cassini_uvis_solarocc_beckerjarmak2023/data/collection_data.xml
5c0fa44ac664f34d840b1d67a7777a0a    cassini_uvis_solarocc_beckerjarmak2023/data/uvis_euv_2006_257_solar_time_series_ingress.tab
$ head -5 cassini_uvis_solarocc_beckerjarmak2023_v1.0_*_transfer_manifest_v1.0.tab 
urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023::1.0                                                                                                                                                                                                       /cassini_uvis_solarocc_beckerjarmak2023_v1.0/bundle.xml
urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023::1.0                                                                                                                                                                                                       /cassini_uvis_solarocc_beckerjarmak2023_v1.0/readme.txt
urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023:data::2.0                                                                                                                                                                                                  /cassini_uvis_solarocc_beckerjarmak2023/data/collection_data.csv
urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023:data::2.0                                                                                                                                                                                                  /cassini_uvis_solarocc_beckerjarmak2023/data/collection_data.xml
urn:nasa:pds:cassini_uvis_solarocc_beckerjarmak2023:data:uvis_euv_2006_257_solar_time_series_ingress::1.1                                                                                                                                                      /cassini_uvis_solarocc_beckerjarmak2023/data/uvis_euv_2006_257_solar_time_series_ingress.tab

♻️ Related Issues

jordanpadams commented 1 month ago

the AIP is now failing validation, but will merge this initial change.