Open mbjones opened 1 year ago
See related report for the ADC dataset above in RT here: https://support.nceas.ucsb.edu/rt/Ticket/Display.html?id=25790
This all sounds good. I think there are a few separate checks here
we may also consider breaking up the valid (there might be a better word for that, parsable?) checks into specific file types. not sure if lumping or splitting will be more maintainable. Broadly, I think that the most common data file types we see that we should check are:
Purpose
To ensure that the content in a data file is valid with respect to its data format. Relates to checks #9 and #2
Components
For each object in a package, ensure the following:
1) all characters in text files are valid within the declared character encoding for that file
\x00
to\xFF
We might consider whether the text versus binary checks in the list above might be better handled as separate checks.
Result
SUCCESS
if all files in the package are validFAILURE
if one or more files are not validERROR
if the check system failsExample:
This example ADC data package contains a data file that should be ASCII-formatted CSV data, but contains erroneous non-ASCII characters, as shown below: