NASA-PDS / validate

Validates PDS4 product labels, data and PDS3 Volumes
https://nasa-pds.github.io/validate/
Apache License 2.0
16 stars 11 forks source link

ERROR [error.array.value_out_of_min_max_range] evaluation is not correct #529

Closed katecrombie closed 1 year ago

katecrombie commented 2 years ago

When validating a pds4.folder of image type arrays, several errors were generated during the evaluation of the min/max array values. Some were correct, others were not.

ERROR [error.array.value_out_of_min_max_range] array 1, location (349, 345): Value is greater than the maximum value in the label (max=0.0014194, got=0.0014194006).

This is technically correct, and we can fix it by increasing the significant digits in the data product.

ERROR [error.array.value_out_of_min_max_range] array 1, location (386, 185): Value is greater than the maximum value in the label (max=0.0702686, got=0.07026859).

This is not correct.

My assumption is that validate is looking at the number of significant digits, and is giving incorrect results when the number of digits is different.

I can attach test case data products if needed.

Engineering Details

Tightly coupled with #514 and #434

see #574 for additional info

jordanpadams commented 2 years ago

@katecrombie example data products would be great.

can you also provide the version of Validate you are running?

validate -V
jordanpadams commented 2 years ago

@katecrombie also, if you could create an issue using the Bug template in the future, that would be great so it gets assigned to me right away! otherwise this can slip through the cracks :-)

katecrombie commented 2 years ago

Using Validate 2.2.0

katecrombie commented 2 years ago

roger that on the bug template (oops)

jordanpadams commented 2 years ago

@katecrombie do you have an example product of where this is failing?

katecrombie commented 2 years ago

erospdart_lblexample2.zip

report.txt is the validation report for these files. A few pass. a few fail.

jordanpadams commented 2 years ago

@katecrombie do you have the XML labels for these products? the ZIP you sent only include the LBLs.

katecrombie commented 2 years ago

erospdart_lblexample_20220429.zip

This zip should have the .lbl, .fit and .xml files. The report.txt file is a validation report that shows the failures.

jordanpadams commented 2 years ago

@katecrombie those 2 zips do not appear to contain any XML labels? Just the lbl and .fit files.

katecrombie commented 2 years ago

D'oh.... Third time is the charm. erospdart_lblexample_20220429_2.zip

al-niessner commented 1 year ago

@jordanpadams @nutjob4life @tloubrieu-jpl

Is this an error that should be looked at or a missing file?

  FAIL: file:/tmp/test/m0157417198f3_2p_iof_gbl.xml
      ERROR  [error.label.context_ref_not_found]   line 84: 'Context product not found: urn:nasa:pds:context:instrument:near.msi
        8 product validation(s) completed
jordanpadams commented 1 year ago

@katecrombie FYI, we are fairly certain max=0.0014194, got=0.0014194006 is a spurious error. 0.0014194006 is just an issue with the significant digits and how doubles are stored in memory.

rchenatjpl commented 1 year ago

@jordanpadams Here's an extreme example I created: ERROR [error.array.value_out_of_min_max_range] array 1, location (400, 246): Value is less than the minimum value in the label (min=5.0E-5, got=0.005). If that's unclear, multiply both numbers by 10000, and validate is saying 5 is greater than 500.

The two attached .FITs files differ by 1 bit. At the cell in question, m5under.fit: 0x3ba3d70a m5over.fit: 0x3ba3d70b which in decimal are 0.00499999988824 and 0.0050000003539. So yes, it's a rounding problem, but note that the value in both labels has exactly 1 significant digit (brackets removed): minimum 0.00005 /minimum

I think the bug is important. If the data values are like 123.45678, then this error won't fire much. Buf if the values are like 0.000012345678, this error will fire often. And some datasets will be in the latter domain. Archive.zip

al-niessner commented 1 year ago

@jordanpadams @rchenatjpl

Not surprised this is a problem when min is represented by scientific notation. The original code, which was tweaked, compares using strings on the assumption they are in the form 0.00005 not 5.0E-5. The most probable reason for this is floating point representation is not perfect so 0.00005 can turn into 0.0000501 or 0.0000499 or something else nearby. String comparisons were then used to prevent this representation error but it makes processing forms other than direct decimal more difficult.

No matter the approach, there will be problems because the continuous number line cannot be accurately represented for any number in a computer. There are a couple ov choices:

  1. convert things to doubles and live with the representation errors that introduced.
  2. convert things to doubles then forever fit the code for corner cases.
  3. convert all numbers to direct decimal formats (not scientific format)
    1. using Double to do the conversion introduces same problems as 1 and/or 2
    2. do string manipulation to move decimal point around padding with 0 as needed

Any of those can be done, but they are fraught with peril and I need to which peril is the desired peril.

rchenatjpl commented 1 year ago

golish.zip

I hope the attached file goes through. It's 700+ MB. It has a bundle that I've pared down to only those data files for which validate incorrectly flagged bad values. Below is validate's output.

` PDS Validate Tool Report

Configuration: Version 3.1.0-SNAPSHOT Date 2023-02-07T06:47:27Z

Parameters: Targets [file:/Users/rchen/Desktop/pds_review/] Rule Type pds4.bundle Severity Level WARNING Recurse Directories true File Filters Used [.xml, .XML] Data Content Validation on Product Level Validation on Allow Unlabeled Files false Max Errors 100000 Registered Contexts File /Users/rchen/PDS4tools/validate/resources/registered_context_products.json

Product Level Validation Results

PASS: file:/Users/rchen/Desktop/pds_review/bundle_nearmsi_gbl.xml 1 product validation(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/data/collection_data.xml 2 product validation(s) completed

FAIL: file:/Users/rchen/Desktop/pds_review/data/m0154651923f6_2p_cif_gbl.xml Begin Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651923f6_2p_cif_gbl.fit ERROR [error.array.value_out_of_min_max_range] array 1, location (400, 246): Value is less than the minimum value in the label (min=0.007827, got=0.007827956). End Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651923f6_2p_cif_gbl.fit 3 product validation(s) completed

FAIL: file:/Users/rchen/Desktop/pds_review/data/m0156835546f4_2p_iof_gbl.xml Begin Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0156835546f4_2p_iof_gbl.fit ERROR [error.array.value_out_of_min_max_range] array 1, location (50, 365): Value is less than the minimum value in the label (min=0.009938, got=0.009938589). End Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0156835546f4_2p_iof_gbl.fit 4 product validation(s) completed

FAIL: file:/Users/rchen/Desktop/pds_review/data/m0154651921f4_2p_cif_gbl.xml Begin Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651921f4_2p_cif_gbl.fit ERROR [error.array.value_out_of_min_max_range] array 1, location (165, 299): Value is less than the minimum value in the label (min=1.18E-4, got=1.18906144E-4). End Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651921f4_2p_cif_gbl.fit 5 product validation(s) completed

FAIL: file:/Users/rchen/Desktop/pds_review/data/m0154651915f1_2p_cif_gbl.xml Begin Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651915f1_2p_cif_gbl.fit ERROR [error.array.value_out_of_min_max_range] array 1, location (322, 51): Value is less than the minimum value in the label (min=0.008009, got=0.00800981). End Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651915f1_2p_cif_gbl.fit 6 product validation(s) completed

FAIL: file:/Users/rchen/Desktop/pds_review/data/m0125249036f2_2p_iof_gbl.xml Begin Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0125249036f2_2p_iof_gbl.fit ERROR [error.array.value_out_of_min_max_range] array 1, location (242, 287): Value is greater than the maximum value in the label (max=0.087356, got=0.08735539). End Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0125249036f2_2p_iof_gbl.fit 7 product validation(s) completed

PDS4 Bundle Level Validation Results

FAIL: file:/Users/rchen/Desktop/pds_review/data/.DS_Store ERROR [error.file.name_has_invalid_characters] File name uses invalid character WARNING [warning.file.not_referenced_in_label] File is not referenced by any label 1 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/data/collection_data.xml 2 integrity check(s) completed

FAIL: file:/Users/rchen/Desktop/pds_review/.DS_Store ERROR [error.file.name_has_invalid_characters] File name uses invalid character WARNING [warning.file.not_referenced_in_label] File is not referenced by any label 3 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/bundle_nearmsi_gbl.xml 4 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/data/m0154651923f6_2p_cif_gbl.xml 5 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/data/m0154651921f4_2p_cif_gbl.xml 6 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/data/m0156835546f4_2p_iof_gbl.xml 7 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/data/m0154651915f1_2p_cif_gbl.xml 8 integrity check(s) completed

PASS: file:/Users/rchen/Desktop/pds_review/data/m0125249036f2_2p_iof_gbl.xml 9 integrity check(s) completed

Summary:

7 error(s) 2 warning(s)

Product Validation Summary: 2 product(s) passed 5 product(s) failed 0 product(s) skipped

Referential Integrity Check Summary: 7 check(s) passed 2 check(s) failed 0 check(s) skipped

Message Types: 5 error.array.value_out_of_min_max_range 2 error.file.name_has_invalid_characters 2 warning.file.not_referenced_in_label

End of Report Completed execution in 6256 ms

`

al-niessner commented 1 year ago

@rchenatjpl

Cannot reproduce your errors. Please check your validate that it has code updates from 529 in them and/or that golish.zip is correct. I may not have understood your checks, but I get no errors:

validate /tmp/golish/bundle_nearmsi_gbl.xml

PDS Validate Tool Report

Configuration:
   Version                       3.2.0-SNAPSHOT
   Date                          2023-02-07T16:29:24Z

Parameters:
   Targets                       [file:/tmp/golish/bundle_nearmsi_gbl.xml]
   Severity Level                WARNING
   Recurse Directories           true
   File Filters Used             [*.xml, *.XML]
   Data Content Validation       on
   Product Level Validation      on
   Max Errors                    100000
   Registered Contexts File      /home/niessner/Projects/PDS/validate/src/main/resources/util/registered_context_products.json

Product Level Validation Results

  PASS: file:/tmp/golish/bundle_nearmsi_gbl.xml
        1 product validation(s) completed
Feb 07, 2023 8:29:27 AM gov.nasa.pds.tools.validate.task.ValidationTask execute
INFO: Validation complete for location 'file:/tmp/golish/bundle_nearmsi_gbl.xml'

Summary:

  0 error(s)
  0 warning(s)

  Product Validation Summary:
    1          product(s) passed
    0          product(s) failed
    0          product(s) skipped

  Referential Integrity Check Summary:
    0          check(s) passed
    0          check(s) failed
    0          check(s) skipped

End of Report
Completed execution in 4279 ms
rchenatjpl commented 1 year ago

ok, 3.1.1 did validate both my last zips correctly.

al-niessner commented 1 year ago

@rchenatjpl

You are okay with closing this ticket again then?

rchenatjpl commented 1 year ago

yes, we can close this. I should have checked the latest version. OTOH, if this wasn't specifically addressed in a build, it might pop up again.

miguelp1986 commented 1 year ago

Testrail: https://cae-testrail.jpl.nasa.gov/testrail/index.php?/cases/view/1273422