NASA-PDS / validate

Validates PDS4 product labels, data and PDS3 Volumes
https://nasa-pds.github.io/validate/
Apache License 2.0
16 stars 11 forks source link

unlabeled files not flagged in report for PDS4 #287

Closed tbarnes4 closed 3 years ago

tbarnes4 commented 3 years ago

Describe the bug The validate tool does not check for unlabeled files for PDS4. For PDS3, it still checks and gives a warning, as expected. This is odd given the --allow-unlabeled-files flag in the help states: "Tells the tool to not check for unlabeled files in a bundle or collection." Is that flag always on for PDS4?

To Reproduce Steps to reproduce the behavior:

  1. Go to any PDS4 bundle.
  2. Add an erroneous non-label file.
  3. Run the validate tool on the bundle with no options.
  4. Read the report and you will find the erroneous file in not mentioned.

Expected behavior Ensure a warning or error is reported when a file is found that does not have a label.

Version of Software Used gov.nasa.pds:validate Version 1.24.0 Release Date: 2020-09-08 22:33:13

jordanpadams commented 3 years ago

@tbarnes4 please try running with the -R pds4.bundle rule. this will complete all of the necessary referential integrity checks you are discussing. without that rule, validate simply validates each product individually, and does not know maintain information to perform referential validation.

See the Validate Quick Start Guide for more information.

The -R flag indicates to the tool to apply bundle validation rules to the target bundle. This means that validation at the bundle level will be performed, which includes referential integrity checking among other things. Please see the Validation Rules section for more details. The -M flag performs additional checksum validation.

tbarnes4 commented 3 years ago

@jordanpadams Thanks. I'm getting mixed results with. In the Bundle Level part of the report, it says that most of my collection product files don't have labels even though they do and for the bundle.xml itself claims it can't find 5 of the 6 collections. It might be that the offending collection.xml files and the other collection products appear to have no schematrons specified.... This may be bad data on our end. When things look clean, it does catch my inserted 'uglyfile', but if a product label does not have schematrons specified, the script also flags these in the Bundle level as missing a label. Is that expected?

I think we can close this issue. If I truly find an issue on this I'll raise it again. Sorry for the trouble and thanks again.

jordanpadams commented 3 years ago

@tbarnes4 can you point me in the direction of the data you are trying to validate?

tbarnes4 commented 3 years ago

@jordanpadams

The source data can be found here: https://pdssbn.astro.umd.edu/holdings/pds4-compil-comet-v1.0/

Landing page for the bundle: https://pdssbn.astro.umd.edu/holdings/pds4-compil-comet-v1.0/SUPPORT/dataset.shtml

jordanpadams commented 3 years ago

@tbarnes4

i just performed a few tests and looks like the latest development version of validate helps improve those schematron error messages you are seeing (typo with the schematypens= in the schematron definition)

  PASS: file:/data/local/starbase/data/pds4/test-bundles/pds4-compil-comet-v1.0/lightcurves/collection.xml
      WARNING  [warning.label.bad_schematypens]   Could not find expected pattern [schematypens=] in label file:/data/local/starbase/data/pds4/test-bundles/pds4-compil-comet-v1.0/lightcurves/collection.xml
      WARNING  [warning.label.missing_schematron_spec]   No schematrons specified in the label
        14 product validation(s) completed

However, the fact that it throws that follow-on warning with the product not being found should not happen. created a ticket to fix this. #291