fsprojects / FSharp.Data

F# Data: Library for Data Access
https://fsprojects.github.io/FSharp.Data
Other
806 stars 288 forks source link

Fix reading CSV from non seekable network stream #1472

Closed Lenne231 closed 1 year ago

Lenne231 commented 1 year ago

When using the FSharp.Data.Csv.Core package to read massive CSV data from an AWS S3 bucket we had some issues because sometimes many rows got lost.

The problem seems to be related to the StreamReader.Peek() method that is used. When StreamReader.Peek() returns -1 the whole reading stops with the current implementation. The documentation states that -1 is not only returned in case of the end of the stream, but also if the stream is not seekable and the stream doesn't read all data that was requested.

An integer representing the next character to be read, or -1 if there are no characters to be read or if the stream does not support seeking.

This can be fixed by using StreamReader.Read() instead, because this method is blocking. I have also tried to add a unit test to simulate the problem with a non seekable stream that always reads just 1 byte when calling Stream.Read(buffer, offset, count).

cartermp commented 1 year ago

@Lenne231 looks like you need to run the code formatter.

Lenne231 commented 1 year ago

@cartermp thanks! Done!