Closed jeanetteclark closed 1 year ago
@jeanetteclark ERROR
is reserved for when the test fails to run (e.g. the network is down). An ERROR indicates a bug in the system, not a data driven failure. When a test runs to completion, it should always return SUCCESS or FAILURE based on the content evaluation. Happy to discuss.
Okay that makes sense. I'll move the "no text files exist" case to success
This check is nearly done - need to do some work to make the mechanism for retrieving data pids (and thus URLs/paths) for data access consistent with what I did for the data format check
Great! Can you define 'text'? Do you mean ASCII? UTF-8? UTF-16? Other unicode encodings? Windows cp-1252?
so I've been thinking the name should probably be changed, since this check is really about delimited text files (csv, tsv) and doesn't deal with encodings at all. Files are identified by looking in the metadata for entities with a physical/dataFormat/textFormat
element. I think though that we should probably be checking on formatId instead. Happy to hear your thoughts
Aha, that makes sense. Yes, I think using formatId
of text/csv
for example makes sense to apply this test. How about naming it something more like data.table-text-delimited.well-formed
? See naming discussion in #15 .
Some related tests might be metadata.formatId.congruent
(to test if the formatId and the values inside the metadata format fields like physical/dataFormat
match) and data.format.congruent
(to test if the data format found in the file matches what is claimed in the metadata formatId.
check has been renamed and restructured @c47b03c8c
going to close this one for now
Purpose
This check will look to see if a tabular data file in a text format can be parsed.
Components
Result
SUCCESS: if one or more files are parsed correctly or no text files exist FAILURE: if no files can be parsed ERROR: if files cannot be accessed