Open radiosci opened 3 years ago
Out of interest, could someone point me to the docs that only allow Table_Character
in a .tab
file? I would certainly consider it best practise, but I don't recall seeing this documented. Or are we only talking about collection inventories here? Just to be sure that the fix doesn't affect existing product. Thanks!
@msbentley agreed. unless I am missing something filenames are all recommendations in the PDS (besides for bundles / collections).
@radiosci is there another error you are pushing for there? I don't think a file suffix is required for bundles/collection inventories?
SR 6C.1.6 reserves *.tab for fixed-width tabular data. Inventory generally has variable width data in field 2. I am not saying that Inventory must have a specific file name extension. The example I provided has variable width data in field 2, a violation of 6C.1.6; but Validate did not report the error.
Great, thanks @radiosci I hadn't paid enough attention to this part of the SR!
@jordanpadams Hi, where's the version of validate that I should test? I downloaded from https://github.com/NASA-PDS/validate/releases/tag/v2.2.1-SNAPSHOT and it's passing both of Dick's .xml files. % validate-2.2.1-SNAPSHOT/bin/validate -V gov.nasa.pds:validate Version 2.2.1-SNAPSHOT Release Date: 2022-04-07 16:27:49 % % % validate-2.2.1-SNAPSHOT/bin/validate -t VALIDATE-CSV/collection_calib_freq.xml
PDS Validate Tool Report
Configuration:
Version 2.2.1-SNAPSHOT
Date 2022-05-16T18:40:13Z
Parameters:
Targets [file:/Users/rchen/Desktop/VALIDATE-CSV/collection_calib_freq.xml]
Severity Level WARNING
Recurse Directories true
File Filters Used [*.xml, *.XML]
Data Content Validation on
Product Level Validation on
Max Errors 100000
Registered Contexts File /Users/rchen/Desktop/validate-2.2.1-SNAPSHOT/resources/registered_context_products.json
Product Level Validation Results
PASS: file:/Users/rchen/Desktop/VALIDATE-CSV/collection_calib_freq.xml
1 product validation(s) completed
Summary:
0 error(s)
0 warning(s)
Product Validation Summary:
1 product(s) passed
0 product(s) failed
0 product(s) skipped
Referential Integrity Check Summary:
0 check(s) passed
0 check(s) failed
0 check(s) skipped
End of Report
Completed execution in 3142 ms
% % % validate-2.2.1-SNAPSHOT/bin/validate -t VALIDATE-TAB/collection_calib_freq.xml
PDS Validate Tool Report
Configuration:
Version 2.2.1-SNAPSHOT
Date 2022-05-16T18:41:22Z
Parameters:
Targets [file:/Users/rchen/Desktop/VALIDATE-TAB/collection_calib_freq.xml]
Severity Level WARNING
Recurse Directories true
File Filters Used [*.xml, *.XML]
Data Content Validation on
Product Level Validation on
Max Errors 100000
Registered Contexts File /Users/rchen/Desktop/validate-2.2.1-SNAPSHOT/resources/registered_context_products.json
Product Level Validation Results
PASS: file:/Users/rchen/Desktop/VALIDATE-TAB/collection_calib_freq.xml
1 product validation(s) completed
Summary:
0 error(s)
0 warning(s)
Product Validation Summary:
1 product(s) passed
0 product(s) failed
0 product(s) skipped
Referential Integrity Check Summary:
0 check(s) passed
0 check(s) failed
0 check(s) skipped
End of Report
% % % diff -r VALI*
Only in VALIDATE-CSV: collection_calib_freq.csv
Only in VALIDATE-TAB: collection_calib_freq.tab
diff -r VALIDATE-CSV/collection_calib_freq.xml VALIDATE-TAB/collection_calib_freq.xml
49c49
< <file_name>collection_calib_freq.csv</file_name>
---
> <file_name>collection_calib_freq.tab</file_name>
Only in VALIDATE-TAB: report-tab.txt
Only in VALIDATE-CSV: report_csv.txt
@rchenatjpl Yes, v2.2.1-SNAPSHOT is the one to for B12.1.
@rchenatjpl Yes, v2.2.1-SNAPSHOT is the one to for B12.1.
Actually, testing was against 2.2.0
@rchenatjpl @jpl-jengelke I finally dug into this ticket some more and see the confusion here. per @radiosci above, SR 6C.1.6 states that tab
is a reserved file name extension for fixed-width tabular data file
. validate should be raising an error for that because a collection is a delimited table, not a fixed width character table.
@radiosci I would like for this to be brought to the DDWG for clarification. for both the csv
and tab
reserved file extension, these should be more explicitly called out in the SR per Table_Character (and its extensions) Table_Delimited (and its extensions like Inventory tables). I am concerned about implementing a check for this when a lot of these file naming constraints have turned out to be recommendations in the SR, versus explicit requirements for validate.
Actually, testing was against 2.2.0
The latest test report (build12.1testProcs_v5.docx) @gxtchen sent out documents using Validate v2.2.1-SNAPSHOT for testing (Section 1.2 Scope). Please update the test report if that's not the case.
Perhaps if this is revisited, the error should be something like, "Collections require character delimited data," or "Collections do not support fixed-width data in *.tab files". Nonetheless, the test data did not support the ticket, and the bug could not be verified in non-collection data.
Jordan Padams: I don't understand what needs to be clarified by DDWG. There its no explicit requirement that Inventory use the .csv file name extension (SR 9C); but it must have delimited records. It is possible for a delimited table also to have fixed-width fields, in which case, .tab would be allowed. The example submitted was an Inventory with a .tab file name extension and variable length fields/records. Since .tab is reserved for "fixed-width tabular data" (SR Table 6C-1), it cannot have variable length fields. Validate appears to be ignoring the Table 6C-1 requirement once the product is determined to be Inventory. There are some additional requirements that could be imposed such as (1) Inventory file name extension must be .csv, and (2) .tab file must have fixed-length fields (including any record suffix bytes, which would be counted as the final field whether specified or not). Neither seems necessary in order to meet the Table 6C-1 requirement. Dick Simpson/radiosci
@radiosci totally agree to what SR Table 6C-1 says, however, the Inventory definition in the label itself says nothing about the fixed width nature of the table. All other fixed width tables in PDS4 require an explicit record length be defined, and Inventory tables (an extension of Table_Delimited) only allows a max record length be defined. We cannot assume fixed width without explicit record length given.
so by it's current definition, I would claim an Inventory table cannot be a fixed width table, which then would mean Inventory tables cannot be *.tab
. This is where I think we need input from the DDWG, because I don't think many people in the PDS would necessarily agree with that file naming restriction, especially when it comes to collection products.
the Inventory definition in the label itself says nothing about the fixed width nature of the table. Correct; Inventory is (in this version of the IM) a subclass of Table_Delimited, a Parsable_Byte_Stream. Table_Base requirements cannot be applied. I would claim an Inventory table cannot be a fixed width table, which then would mean Inventory tables cannot be .tab.
I disagree; the .tab file name extension says only that the file has fixed-width tabular structure. *.tab is not reserved for Table_Base. It is possible for a file to be both Table_Delimited and fixed-width tabular.
@radiosci copy. well due to this disagreement, clarification will be needed through the DDWG and CCB to enforce this check in validate.
Submitted CCB-347 to Jira requesting DDWG clarification.
I found while reviewing this for CCB that this is also covered in 2B.2.2.2 of the SR:
Beneath the root directory in this example structure, the top-level subdirectories have a one-toone correspondence with the bundle’s collections. Each collection subdirectory has a collection label with a name of the form “collection[*] and having an extension of either “.xml” or “.lblx”. and a collection inventory table with a name of the form “collection[].csv”. (Sometimes “collection[_]_inventory.csv” is used.) See Section 9C.1 for details on construction of collection inventory tables.
The naming requirements here are even more stringent than what @radiosci requested. But it's in a location that nobody expects.
SR 2B.2 (and its subsections) apply only in certain cases and are "suggested".
🐛 Describe the bug
A Collection product with an Inventory file (subclass of Table_Delimited) has one 63-byte record and two 64-byte records. It is given a file name ending with *.tab (reserved for tables with fixed length records). Validate passes the product.
📜 To Reproduce
Steps to reproduce the behavior:
🕵️ Expected behavior
Validate should report that *.tab files must have fixed-length fields
📚 Version of Software Used
validate 2.0.7
🩺 Test Data / Additional context
VALIDATE-TAB has the bad file that passed. VALIDATE-CSV has a CSV version that also passed. Both folders have copies of their respective report files.
validate-tab.tar.gz validate-csv.tar.gz
🏞Screenshots
None. See report files in respective folders.
🖥 System Info
🦄 Related requirements
⚙️ Engineering Details