Closed cpina closed 4 years ago
Hi @cpina,
Please try goodtables CLI
Hi @cpina,
Please try goodtables CLI
Thanks very much @roll - goodtables CLI does (after a quick look) almost all that what we wanted!
Some comments:
-our resources have the md5 because it seems to be the preferred one here: https://specs.frictionlessdata.io/data-resource/#metadata-properties . But goodtables says Warning: Resource "ace_tm_concentrations" does not use the SHA256 hash. The check will be skipped
. Should goodtables check the md5? (Do you want that I open an issue? Or should sha1 be the "favourite" one in the data-resource documentation?)
-when two columns have the same name: tableschema validate
says that the schema is valid, and the read()
doesn't complain. But goodtables
says that [-,20] [duplicate-header] Header in column 20 is duplicated to header in column(s) 17
. I'm happy with goodtables
complaining about duplicated names, but should tableschema validate
do the same?
Thanks again for making me look at goodtables.
Hi @cpina,
I'm currently working on a new version of goodtables
which will support md5 and other hash algorithms.
Regarding duplicated field names, it's ok by the specs - https://specs.frictionlessdata.io/table-schema/. It's the reason why datapackage
doesn't complain. On the other hand, goodtables also tries to force best practices e.g. not having such field names. This check can be skipped with goodtables data/invalid.csv --skip-checks duplicate-header
Thanks very much! :-) feel free to close this issue (or should we wait to have the md5 support?)
I'll merge it into - https://github.com/frictionlessdata/goodtables-py/issues/341
Overview
Right now I see that if I do
datapackage validate datapacakge.json
it validates (I think, just a quick test): the JSON is correct, the required fields are present, the fields have the correct type / pass the regular expressions, etc.We expected some more validations: -If there is a resource and the resource has a local path specified: should validate that the file is there -If there is a resource and the resource has a local path with bytes and hash: validate that the file has the correct bytes/hash -If the resource has a remote URL: download it, validate bytes and hash if possible -If the resource is a tabular data: try to "read" it to validate the columns, missing values and other tabular verifications
All of this is easy to do with (we did). But we expected the
validate
to do it (or to have flags to do it). Some users of Frictionless Data might not be so keen on implementing the checks by themselves and might just want to use the Python CLI to validate.Please preserve this line to notify @roll (lead of this repository)