aloneguid / parquet-dotnet

Fully managed Apache Parquet implementation
https://aloneguid.github.io/parquet-dotnet/
MIT License
600 stars 151 forks source link

[BUG]: No rows read from an Amazon CUR 2.0 Parquet File #540

Closed drusellers closed 1 day ago

drusellers commented 2 months ago

Library Version

4.24.0

OS

Mac

OS Architecture

ARM 64

How to reproduce?

  1. Download an AWS CUR 2.0 file (Parquet + Snappy compression)
using Parquet;

var pwd = Environment.CurrentDirectory;
var path = Path.Join(pwd, "ParquetDaily-00001.snappy.parquet");
if(!File.Exists(path)) throw new Exception("Missing file");
var p = await ParquetReader.ReadTableFromFileAsync(path);
Console.WriteLine(p.Count); // outputs 0

Expecting 8,000+ rows

Note: It does get the 114 columns correct.

DuckDB can open the parquet file (used to conform its not a trash file)

Failing test

No response

aloneguid commented 2 months ago

What are those CUR 2.0 files? Are you able to share an example?

drusellers commented 1 month ago

Sent via email

aloneguid commented 3 days ago

@drusellers 8719 rows, seems to be no issues. Replied by email.

aloneguid commented 1 day ago

@drusellers this may be caused by a regression in 4.24. Please try the latest preview and reopen if it still persists.