denoland / std

The Deno Standard Library
https://jsr.io/@std
MIT License
3.22k stars 621 forks source link

[encoding/csv] Make the columns option more permissive while parsing #3225

Closed devingfx closed 11 months ago

devingfx commented 1 year ago

Describe the bug

If columns option is given to the parser (or the same with skipFirstRow: true) and a line doesn't contains the good number of columns an error is thrown and the whole process abord. Sometimes files doesn't contains all the columns, especilly without quotes and with TSV separator, and it's ok and good practice to reduce the file size...

Steps to Reproduce

const content = `A B C
ok ok ok
still ok
 not first

first-only
`
// even with a line with empty columns should be parsed

try {
    let dataBeWithoutError = CSV.parse( content, { separator: ' ', skipFirstRow: true })
}
catch( happyness )
{
    console.error( 'cry you fool! and do it by hand' )
}

Expected behavior

Put undefined on undefined columns and don't throw !!

[
    { A: 'ok', B: 'ok', C: 'ok' },
    { A: 'still', B: 'ok', C: undefined },
    { A: '', B: 'not', C: 'first' },
    { A: undefined, B: undefined, C: undefined },
    { A: 'first-only', B: undefined, C: undefined },
]

Notice I distinguish empty strings columns and undefined columns like the 1st one of 3rd line

sigmaSd commented 1 year ago

According to the spec https://csv-spec.org/ (4th point)

Each record MUST contain the same number of fields

sigmaSd commented 1 year ago

I don't expect JSON.parse to parse invalid json and the same is with csv parser, my opinion is that the user should preprocess the file to make it a valid csv before trying to parse it

timreichen commented 1 year ago

I agree with @sigmaSd. Throwing an error when trying to parse an invalid string is the correct and expected behavior.

luk3skyw4lker commented 1 year ago

Yeah, I'm with @sigmaSd and @timreichen

This isn't a good and proper behavior when parsing files, if any data is invalid the package should throw an error to warn the user that the data type is invalid. This treatment should be done before any parsing.

jgusta commented 1 year ago

Shouldn't a valid empty field parse to an empty string, rather than undefined, anyway?

All fields are always strings. CSV itself does not support type casting to integers, floats, booleans, or anything else. It is not a CSV library’s responsibility to type cast input CSV data.

iuioiua commented 11 months ago

I agree with the others here. The implementation shouldn't be retrofitted to work with incorrect data. Instead, the data needs to be correct. Thank you @sigmaSd, @timreichen and @luk3skyw4lker.