Open mace-space opened 4 months ago
@mace-space is this data available online somewhere?
also, I took a look at the log files, and, at first glance it looks like we probably didn't catch some of these errors for a few reasons:
If sending the whole data set is not reasonable, even a small subset of the data with:
Thanks for looking into this @jordanpadams. Here's a subset of data.
@mace-space it does not look like the ZIP file fully uploaded prior to submitting your comment. Would you mind trying to upload again?
Sorry for the delay, I've been on vacation. Here's the subset of the bundle.
Checked for duplicates
Yes - I've already checked
🐛 Describe the bug
Related to #432 (was asked to open another ticket)
I ran validate using
--rule pds4.bundle
but no referential checks were performed (even though with that option it should check references):(Note the max error threshold has been exceeded).
I also tried running it on the specific collection where I had spotted LID errors:
Here's an example browse label from that collection:urn:nasa:pds:wenkert_pdart16_vgr_rav1ciun:browse_qedr:vgr_1201-mamqtv-001010-data-001010.001.png</logical_identifier>
12 \1.0</version_id>
13 \RAV1CIUN DATA Browse Product - vgr_1201-mamqtv-001010-data-001010.001.png</title>
14 \1.16.0.0</information_model_version>
15 \Product_Browse</product_class>
16 \</Identification_Area>
17 \<Reference_List>
18 \<Internal_Reference>
19 \urn:nasa:pds:wenkert_pdart16_vgr_rav1ciun:browse_qedr:vgr_1201-mamqtv-001010-data-001010.001</lid_reference>
20 \browse_to_data</reference_type>
21 \This is a reference to the full resolution data file corresponding to this browse image.\
22 \</Internal_Reference>
23 \</Reference_List>
24 \<File_Area_Browse>
25 \
26 \VGR_1201-MAMQTV-001010-DATA-001010.001.png</file_name>
27 \BROWSE_FILE</local_identifier>
28 \2023-08-18</creation_date_time>
29 \</File>
30 \<Encoded_Image>
31 \BROWSE_IMAGE</local_identifier>
32 \0</offset>
33 \PNG</encoding_standard_id>
34 \</Encoded_Image>
35 \</File_Area_Browse>
36 \</Product_Browse>
1 \<?xml version="1.0" encoding="UTF-8" standalone="no"?> 2 3 \<?xml-model href="https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1G00.sch" 4 schematypens="http://purl.oclc.org/dsdl/schematron"?> 5 6 \<Product_Browse xmlns="http://pds.nasa.gov/pds4/pds/v1" 7 xmlns:pds="http://pds.nasa.gov/pds4/pds/v1" 8 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 9 xsi:schemaLocation="http://pds.nasa.gov/pds4/pds/v1 https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1G00.xsd"> 10 \<Identification_Area> 11 \
Line 19 points to an incorrect LID, but Validate does not report any of these:
It passed all of the browse labels (the one fail refers to a .DS_Store file).
So, unlike the
-R pds4.bundle
option, with the-R pds4.collection
it does report referential integrity checks. However, it is not catching incorrect LIDs.The LID urn:nasa:pds:wenkert_pdart16_vgr_rav1ciun:browse_qedr:vgr_1201-mamqtv-001010-data-001010.001 does not exist (the browse LIDs have .png suffixes), although it shouldn't even be self-referencing the browse_qedr collection but rather the data_qedr collection.
🕵️ Expected behavior
Validate flag an error for non-existing LIDs
📜 To Reproduce
% validate --rule pds4.bundle --report-file rav1ciun_validate_v3.5.1.log --verbose 2 --target ./wenkert_pdart16_vgr_rav1ciun
% validate --rule pds4.collection --report-file rav1ciun_browse_validate_v3.5.1.log --verbose 2 --target ./wenkert_pdart16_vgr_rav1ciun/browse
🖥 Environment Info
📚 Version of Software Used
Validate v3.5.1
🩺 Test Data / Additional context
Bundle tar.gz too large to attach here, shall I share via Dropbox or would you need just a sample?
Bundle validate log rav1ciun_validate_v3.5.1_browse_collection.log
🦄 Related requirements
No response
⚙️ Engineering Details
No response
🎉 Integration & Test
No response