Closed katecrombie closed 1 year ago
@katecrombie example data products would be great.
can you also provide the version of Validate you are running?
validate -V
@katecrombie also, if you could create an issue using the Bug template in the future, that would be great so it gets assigned to me right away! otherwise this can slip through the cracks :-)
Using Validate 2.2.0
roger that on the bug template (oops)
@katecrombie do you have an example product of where this is failing?
report.txt is the validation report for these files. A few pass. a few fail.
@katecrombie do you have the XML labels for these products? the ZIP you sent only include the LBLs.
erospdart_lblexample_20220429.zip
This zip should have the .lbl, .fit and .xml files. The report.txt file is a validation report that shows the failures.
@katecrombie those 2 zips do not appear to contain any XML labels? Just the lbl and .fit files.
D'oh.... Third time is the charm. erospdart_lblexample_20220429_2.zip
@jordanpadams @nutjob4life @tloubrieu-jpl
Is this an error that should be looked at or a missing file?
FAIL: file:/tmp/test/m0157417198f3_2p_iof_gbl.xml
ERROR [error.label.context_ref_not_found] line 84: 'Context product not found: urn:nasa:pds:context:instrument:near.msi
8 product validation(s) completed
@katecrombie FYI, we are fairly certain max=0.0014194, got=0.0014194006
is a spurious error. 0.0014194006
is just an issue with the significant digits and how doubles are stored in memory.
@jordanpadams Here's an extreme example I created: ERROR [error.array.value_out_of_min_max_range] array 1, location (400, 246): Value is less than the minimum value in the label (min=5.0E-5, got=0.005). If that's unclear, multiply both numbers by 10000, and validate is saying 5 is greater than 500.
The two attached .FITs files differ by 1 bit. At the cell in question, m5under.fit: 0x3ba3d70a m5over.fit: 0x3ba3d70b which in decimal are 0.00499999988824 and 0.0050000003539. So yes, it's a rounding problem, but note that the value in both labels has exactly 1 significant digit (brackets removed): minimum 0.00005 /minimum
I think the bug is important. If the data values are like 123.45678, then this error won't fire much. Buf if the values are like 0.000012345678, this error will fire often. And some datasets will be in the latter domain. Archive.zip
@jordanpadams @rchenatjpl
Not surprised this is a problem when min is represented by scientific notation. The original code, which was tweaked, compares using strings on the assumption they are in the form 0.00005 not 5.0E-5. The most probable reason for this is floating point representation is not perfect so 0.00005 can turn into 0.0000501 or 0.0000499 or something else nearby. String comparisons were then used to prevent this representation error but it makes processing forms other than direct decimal more difficult.
No matter the approach, there will be problems because the continuous number line cannot be accurately represented for any number in a computer. There are a couple ov choices:
Any of those can be done, but they are fraught with peril and I need to which peril is the desired peril.
I hope the attached file goes through. It's 700+ MB. It has a bundle that I've pared down to only those data files for which validate incorrectly flagged bad values. Below is validate's output.
` PDS Validate Tool Report
Configuration: Version 3.1.0-SNAPSHOT Date 2023-02-07T06:47:27Z
Parameters: Targets [file:/Users/rchen/Desktop/pds_review/] Rule Type pds4.bundle Severity Level WARNING Recurse Directories true File Filters Used [.xml, .XML] Data Content Validation on Product Level Validation on Allow Unlabeled Files false Max Errors 100000 Registered Contexts File /Users/rchen/PDS4tools/validate/resources/registered_context_products.json
Product Level Validation Results
PASS: file:/Users/rchen/Desktop/pds_review/bundle_nearmsi_gbl.xml 1 product validation(s) completed
PASS: file:/Users/rchen/Desktop/pds_review/data/collection_data.xml 2 product validation(s) completed
FAIL: file:/Users/rchen/Desktop/pds_review/data/m0154651923f6_2p_cif_gbl.xml Begin Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651923f6_2p_cif_gbl.fit ERROR [error.array.value_out_of_min_max_range] array 1, location (400, 246): Value is less than the minimum value in the label (min=0.007827, got=0.007827956). End Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651923f6_2p_cif_gbl.fit 3 product validation(s) completed
FAIL: file:/Users/rchen/Desktop/pds_review/data/m0156835546f4_2p_iof_gbl.xml Begin Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0156835546f4_2p_iof_gbl.fit ERROR [error.array.value_out_of_min_max_range] array 1, location (50, 365): Value is less than the minimum value in the label (min=0.009938, got=0.009938589). End Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0156835546f4_2p_iof_gbl.fit 4 product validation(s) completed
FAIL: file:/Users/rchen/Desktop/pds_review/data/m0154651921f4_2p_cif_gbl.xml Begin Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651921f4_2p_cif_gbl.fit ERROR [error.array.value_out_of_min_max_range] array 1, location (165, 299): Value is less than the minimum value in the label (min=1.18E-4, got=1.18906144E-4). End Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651921f4_2p_cif_gbl.fit 5 product validation(s) completed
FAIL: file:/Users/rchen/Desktop/pds_review/data/m0154651915f1_2p_cif_gbl.xml Begin Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651915f1_2p_cif_gbl.fit ERROR [error.array.value_out_of_min_max_range] array 1, location (322, 51): Value is less than the minimum value in the label (min=0.008009, got=0.00800981). End Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0154651915f1_2p_cif_gbl.fit 6 product validation(s) completed
FAIL: file:/Users/rchen/Desktop/pds_review/data/m0125249036f2_2p_iof_gbl.xml Begin Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0125249036f2_2p_iof_gbl.fit ERROR [error.array.value_out_of_min_max_range] array 1, location (242, 287): Value is greater than the maximum value in the label (max=0.087356, got=0.08735539). End Content Validation: file:/Users/rchen/Desktop/pds_review/data/m0125249036f2_2p_iof_gbl.fit 7 product validation(s) completed
PDS4 Bundle Level Validation Results
FAIL: file:/Users/rchen/Desktop/pds_review/data/.DS_Store ERROR [error.file.name_has_invalid_characters] File name uses invalid character WARNING [warning.file.not_referenced_in_label] File is not referenced by any label 1 integrity check(s) completed
PASS: file:/Users/rchen/Desktop/pds_review/data/collection_data.xml 2 integrity check(s) completed
FAIL: file:/Users/rchen/Desktop/pds_review/.DS_Store ERROR [error.file.name_has_invalid_characters] File name uses invalid character WARNING [warning.file.not_referenced_in_label] File is not referenced by any label 3 integrity check(s) completed
PASS: file:/Users/rchen/Desktop/pds_review/bundle_nearmsi_gbl.xml 4 integrity check(s) completed
PASS: file:/Users/rchen/Desktop/pds_review/data/m0154651923f6_2p_cif_gbl.xml 5 integrity check(s) completed
PASS: file:/Users/rchen/Desktop/pds_review/data/m0154651921f4_2p_cif_gbl.xml 6 integrity check(s) completed
PASS: file:/Users/rchen/Desktop/pds_review/data/m0156835546f4_2p_iof_gbl.xml 7 integrity check(s) completed
PASS: file:/Users/rchen/Desktop/pds_review/data/m0154651915f1_2p_cif_gbl.xml 8 integrity check(s) completed
PASS: file:/Users/rchen/Desktop/pds_review/data/m0125249036f2_2p_iof_gbl.xml 9 integrity check(s) completed
Summary:
7 error(s) 2 warning(s)
Product Validation Summary: 2 product(s) passed 5 product(s) failed 0 product(s) skipped
Referential Integrity Check Summary: 7 check(s) passed 2 check(s) failed 0 check(s) skipped
Message Types: 5 error.array.value_out_of_min_max_range 2 error.file.name_has_invalid_characters 2 warning.file.not_referenced_in_label
End of Report Completed execution in 6256 ms
`
@rchenatjpl
Cannot reproduce your errors. Please check your validate that it has code updates from 529 in them and/or that golish.zip is correct. I may not have understood your checks, but I get no errors:
validate /tmp/golish/bundle_nearmsi_gbl.xml
PDS Validate Tool Report
Configuration:
Version 3.2.0-SNAPSHOT
Date 2023-02-07T16:29:24Z
Parameters:
Targets [file:/tmp/golish/bundle_nearmsi_gbl.xml]
Severity Level WARNING
Recurse Directories true
File Filters Used [*.xml, *.XML]
Data Content Validation on
Product Level Validation on
Max Errors 100000
Registered Contexts File /home/niessner/Projects/PDS/validate/src/main/resources/util/registered_context_products.json
Product Level Validation Results
PASS: file:/tmp/golish/bundle_nearmsi_gbl.xml
1 product validation(s) completed
Feb 07, 2023 8:29:27 AM gov.nasa.pds.tools.validate.task.ValidationTask execute
INFO: Validation complete for location 'file:/tmp/golish/bundle_nearmsi_gbl.xml'
Summary:
0 error(s)
0 warning(s)
Product Validation Summary:
1 product(s) passed
0 product(s) failed
0 product(s) skipped
Referential Integrity Check Summary:
0 check(s) passed
0 check(s) failed
0 check(s) skipped
End of Report
Completed execution in 4279 ms
ok, 3.1.1 did validate both my last zips correctly.
@rchenatjpl
You are okay with closing this ticket again then?
yes, we can close this. I should have checked the latest version. OTOH, if this wasn't specifically addressed in a build, it might pop up again.
When validating a pds4.folder of image type arrays, several errors were generated during the evaluation of the min/max array values. Some were correct, others were not.
ERROR [error.array.value_out_of_min_max_range] array 1, location (349, 345): Value is greater than the maximum value in the label (max=0.0014194, got=0.0014194006).
This is technically correct, and we can fix it by increasing the significant digits in the data product.
ERROR [error.array.value_out_of_min_max_range] array 1, location (386, 185): Value is greater than the maximum value in the label (max=0.0702686, got=0.07026859).
This is not correct.
My assumption is that validate is looking at the number of significant digits, and is giving incorrect results when the number of digits is different.
I can attach test case data products if needed.
Engineering Details
Tightly coupled with #514 and #434
see #574 for additional info