apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.62k stars 802 forks source link

Allow reading Parquet maps that lack a `values` field #6730

Open etseidl opened 1 week ago

etseidl commented 1 week ago

Which issue does this PR close?

Closes #1642.

Rationale for this change

The Parquet spec does not require the values field of a map to be present, but current readers will error out if this field is missing.

What changes are included in this PR?

Changes both the record reader and arrow reader to read a MAP lacking values as a list of keys. This matches the behavior of arrow-cpp.

Are there any user-facing changes?

No

etseidl commented 6 days ago

I also poked around in https://github.com/apache/parquet-testing/tree/master/data for an example of such a file, but it seems like we do not have one.

Correct, which is why the effort in the tests to produce one. I'll try submitting one I have on hand to parquet-testing.

etseidl commented 6 days ago

https://github.com/apache/parquet-testing/pull/63 submitted. Maybe we can hold off on merging this to see if the test file will be accepted.