Closed adamc00 closed 9 years ago
Thanks for the PR! Just wondering - aside from the fact that it causes a crash (which is bad), should this sort of error not show just as missing/extra columns, rather than a specific count error? Or is that useful anyway?
@ldodds @pezholio any thoughts on that?
I think it'd be useful to have a single warning of the names of the extra headers (columns), to at least indicate they're being ignored as part of the schema validation.
I don't think it should be an error as an application consuming the data should just ignore the extra columns if they were present. (We might separately want a "strict" mode though).
I think showing the difference between the found and expected columns is a good idea rather than the count as I have done. I'll have a go at that instead.
@ldodds note that there may also be fewer columns than expected so it's not just about too many. Should too few be an error - since that indicates that required data is missing - whereas too many can safely be ignored with a warning?
:+1: for a strict mode BTW. For our purposes I have elevated nearly all warnings to errors. The only things we treat as true warnings are those regarding the missing extra metadata headers regarding the CSV file's attributes.
Doesn't the missing_column
warning cover there being fewer items already?
It's worth checking on the trigger for extra_column
too which is in the same territory.
@idodds don't the missing_column
and extra_column
warnings apply only to the data rows, not the header row? Currently if you have too few headers you get a title_row
error with little other information in it. If in addition to having too few header names you also have mismatched header names (compared to the supplied schema) you get a header_name
error for each instance.
It seems to me that when you supply a schema the title_row
error probably isn't appropriate as this is incorrectly guessing that the supplied CSV file contains a title rather then column headings in the first row. Perhaps this should be suppressed in the case when validating against a schema and instead raise a new 'malformed_header' error containing the array of header names found in the content
attribute of the error/warning and the expected schema supplied header names in the constraints
attribute? This could also replace the variable number of header_name
errors emitted as they would now be redundant.
I have reworked this PR to simply emit a malformed_header
schema
warning when the supplied header does not match the schema. The header that was found is reported in the content
attribute and the schema header is reported in the constraints
attribute.
Could I get some feedback on this PR if it isn't acceptable? I may have handled fixing #126 the wrong way but it still exists. This patch to your master spec/schema_spec.rb demonstrates the bug. Thanks!
We're taking a look today - thanks for the work :)
Yup, this is great - thanks so much :+1:
We've got another change coming up the pipe, and we'll put add this to 1.0.2 later today
That's great! Thanks so much for your work.
This feature fixes #126. Happy to make changes if requested. Ruby is not my first language.