Open rpgoldman opened 5 years ago
The ordering must match. Using the totalColumns directive means that the validator checks that there are the expected number of column definitions given at parse time. If you do not specify it there will still be a validation error once the CSV file is actually read if the number of column definitions does not match the number of columns in the file.
There are some similar issues already #21 and #13, but I'm afraid we've not had resource availableto work on further developments recently, though we would welcome pull requests from others.
Thanks for the response.
I suggested making the order optional because CSVs are often interpreted by tools like python's Pandas, in which the columns are name-addressable, so column ordering is not required for correct operation.
And I mentioned in my original comments that for scientific data there are often additional columns of derived quantities added that don't interfere with correct (assuming name-based addressing) processing of the data.
I imagine that these additional features could add substantially to the difficulty of validation, though.
Maybe this should be tagged as "question-edging-into-enhancement-request"!
I am interested in using the validator for some scientific data where there is a known set of columns that should be checked for reasonable contents, but where I'm not sure that the ordering of columns will be consistent, and where some data providers might have added additional columns of computed values to the raw values that my schema should check.
Thank you