When using the FSharp.Data.Csv.Core package to read massive CSV data from an AWS S3 bucket we had some issues because sometimes many rows got lost.
The problem seems to be related to the StreamReader.Peek() method that is used. When StreamReader.Peek() returns -1 the whole reading stops with the current implementation. The documentation states that -1 is not only returned in case of the end of the stream, but also if the stream is not seekable and the stream doesn't read all data that was requested.
An integer representing the next character to be read, or -1 if there are no characters to be read or if the stream does not support seeking.
This can be fixed by using StreamReader.Read() instead, because this method is blocking. I have also tried to add a unit test to simulate the problem with a non seekable stream that always reads just 1 byte when calling Stream.Read(buffer, offset, count).
When using the
FSharp.Data.Csv.Core
package to read massive CSV data from an AWS S3 bucket we had some issues because sometimes many rows got lost.The problem seems to be related to the
StreamReader.Peek()
method that is used. WhenStreamReader.Peek()
returns-1
the whole reading stops with the current implementation. The documentation states that-1
is not only returned in case of the end of the stream, but also if the stream is not seekable and the stream doesn't read all data that was requested.This can be fixed by using
StreamReader.Read()
instead, because this method is blocking. I have also tried to add a unit test to simulate the problem with a non seekable stream that always reads just 1 byte when callingStream.Read(buffer, offset, count)
.