I think you may have missed the point on issue #476 and have closed the issue in haste.
As I wait for the entire stream to be complete from the API as far as the code is concerned it should be no different from a file stream as, to all intents and purposes, it is a file download and has completed at that point.
Pandas can latch onto the feed from the web API and decodes all columns correctly.
Parquet.net can also do so (as I wait for the entire stream to be present which means it is not different from a file stream at that point), correctly handling the stream but has an issue with the datetime64 column, all other data is decoded correctly.
Parquet.net does not have an issue if pandas persists the file for it.
It’s as if the stream.ReadParquetAsDataFrameAsync method needs to be able to accept ParquetOptions whereby the TreatLargeIntegersAsDates can be set to true which may well solve the problem. I believe ParquetReader can accept such a parameter in its read to its own table type but this extension method providing a Microsoft.Data.Analysis.DataFrame does not.
I think you may have missed the point on issue #476 and have closed the issue in haste.
As I wait for the entire stream to be complete from the API as far as the code is concerned it should be no different from a file stream as, to all intents and purposes, it is a file download and has completed at that point.
It’s as if the
stream.ReadParquetAsDataFrameAsync
method needs to be able to acceptParquetOptions
whereby theTreatLargeIntegersAsDates
can be set to true which may well solve the problem. I believeParquetReader
can accept such a parameter in its read to its own table type but this extension method providing aMicrosoft.Data.Analysis.DataFrame
does not.Originally posted by @totalgit74 in https://github.com/aloneguid/parquet-dotnet/issues/476#issuecomment-1956350077