NASA-PDS / deep-archive

PDS Open Archival Information System (OAIS) utilities, including Submission Information Package (SIP) and Archive Information Package (AIP) generators
https://nasa-pds.github.io/deep-archive/
Other
7 stars 4 forks source link

No warning with inventoried files missing #100

Closed tbarnes4 closed 3 years ago

tbarnes4 commented 3 years ago

Is your feature request related to a problem? Please describe. When the deep archive tool wraps up a bundle, if a collection primary member is missing, it proceeds onward without error or warning or notice. This can be a problem when you are trying to submit to the deep archive a complete collection, but a single file is missing for whatever reason (or in the case of #69 a file has the wrong LID). This can be a serious problem when down the road someone tries to pull the bundle from the deep archive and finds there is a missing product that was never archived.

Describe the solution you'd like Check to see if every product (perhaps only check for primary products?) in the inventory.csv file is included in the AIP/SIP files. If not, report to the user that product X is missing. I would recommend not stopping generation of the AIP/SIP files though.

Describe alternatives you've considered Add a flag that will scan the bundle and report back any missing products. Either this flag (1) will not generate the AIP/SIP files when invoked, (2) will not generate the AIP/SIP files only if there are missing products, or (3) will generate the AIP/SIP files and print to the screen or report file a notice that XYZ file(s) were missing.

Additional context I discovered this problem when I ran the deep archive tool on the EPOXI bundle mentioned in #69. I did a sanity check and noticed the hartley2_photometry/document/epoxi_photometry_v5.[xml|pdf] files were missing from the manifest tables. It happens that the product is mentioned in the inventory.csv file, but the LID in the xml file is wrong and needs to be corrected.

I will also note, the validate tool (1.23.1 2020-05-16) currently does not check for inclusion of all primary members. I am in the process of checking if the current tool version does this or not.

tbarnes4 commented 3 years ago

I said:

I will also note, the validate tool (1.23.1 2020-05-16) currently does not check for inclusion of all primary members. I am in the process of checking if the current tool version does this or not.

I can now confirm the the validate tool version 1.24.0 (2020-09-08) likewise does not check for inclusion of all collections (found in the bundle.xml file), nor primary members of the data collections (as found in the inventory.csv file).

jordanpadams commented 3 years ago

@tbarnes4 how are you running validate tool? have you run it with the -R pds4.bundle rule? it should catch these things?

jordanpadams commented 3 years ago

@tbarnes4 please see https://github.com/NASA-PDS/validate/issues/287#issuecomment-767868233 . pds-deep-archive assumes the bundle is valid. in order to avoid duplication of effort, we would prefer validate be executed to find these issues.

tbarnes4 commented 3 years ago

I used no options and relied on the auto detection. I assume it picked it up as a PDS4 bundle because I got the expected validation results in other ways.

On Tue, Jan 26, 2021, 17:31 Jordan Padams notifications@github.com wrote:

@tbarnes4 https://github.com/tbarnes4 how are you running validate tool? have you run it with the -R pds4.bundle rule? it should catch these things?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NASA-PDS/pds-deep-archive/issues/100#issuecomment-767864579, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASOP3ND7KYJEEP36JSZGRP3S347CTANCNFSM4WTZ7CYQ .

tbarnes4 commented 3 years ago

I agree it is sufficient to skip this suggestion for the pds-deep-archive tool and rely on the validate tool.

jordanpadams commented 3 years ago

Thanks @tbarnes4