fsprojects / FSharp.Data

F# Data: Library for Data Access
https://fsprojects.github.io/FSharp.Data
Other
816 stars 288 forks source link

Csv fails to parse, in my case, on a long string, that reoccur #1434

Closed smoothdeveloper closed 2 years ago

smoothdeveloper commented 2 years ago

I have this .csv file that fails the parser somehow, despite it looks fine trying to open in excel.

I haven't yet checked other Csv parsers on the file, but wanted to report it, and maybe it can be fixed in the library? outagesshort.csv

type Csv = FSharp.Data.CsvProvider<  @"c:\tmp\outagesshort.csv" >

fsharp.data.csvparseerror.fsx(17,12): error FS3033: The type provider 'ProviderImplementation.CsvProvider' reported an error: Cannot read sample CSV from 'c:\tmp\outagesshort.csv': Couldn't parse row 218 according to schema: Expected 21 columns, got 11

In the csv file, the value at column 11 is the same as previous line, which parses correctly, so it feels like some state isn't closed/reset, but maybe it is the csv file which is "malformed".

nikoyak commented 2 years ago
type Csv = FSharp.Data.CsvProvider<  @"c:\tmp\outagesshort.csv", InferRows = 0 >
// or, e.g.
// type Csv = FSharp.Data.CsvProvider<  @"c:\tmp\outagesshort.csv", InferRows = 255 >

This file contains multiline rows, and there is the bug/feature in CsvProvider: InferRow considers lines first, not rows.

smoothdeveloper commented 2 years ago

@nikoyak thanks, do you mean I should try another thing than InferRow for my scenario?

Sorry for not pushing more on assessing the underlying issue, I am now seeing another occurence of parsing, that fails, likely on mis escaped character (pound one).

It is likely the csv producer which has some issue with conformance, the CsvHelper library fails to parse beyond same record FSharp.Data csv parser would report.

SkipErrors works enough to not just drop the whole file processing, but do you know if it is possible to hook into events when row fails to parse?

This would help reporting the skipped ones in processing code using FSharp.Data csv parser.

If I can get more details about the whole thing, I'll come back to it.

smoothdeveloper commented 2 years ago

Closing this, as I have identified another underlying issue, even before the parser in this library takes place.

smoothdeveloper commented 2 years ago

@nikoyak do you mind giving me more pointers (code, issues, PR, farther technical details) to the multiline issues you are referring to?

nikoyak commented 2 years ago

@smoothdeveloper

do you know if it is possible to hook into events when row fails to parse?

https://github.com/fsprojects/FSharp.Data/blob/d5287e4ed3e27eca81033a3c7b3a0a9be52865e5/src/Csv/CsvRuntime.fs#L183-L188

do you mind giving me more pointers (code, issues, PR, farther technical details) to the multiline issues you are referring to?

1439