apache / parquet-site

Apache Parquet Site
https://parquet.apache.org/
Apache License 2.0
8 stars 33 forks source link

Clarify parquet-format with respect to repeated fields across boundaries #67

Closed asfimport closed 5 months ago

asfimport commented 6 months ago

Several implementors have reported that the parquet spec is currently unclear as to when repeated fields can span page boundaries (aka can a logical record be split across a page and/or row group boundary)

 

Discussion on list: https://lists.apache.org/thread/rd8twnvg4bg3558r507rzpxckcxt5wdn

 

The conclusion seems to be that the records can't be split across boundaries for "v2 data pages" or if there is a page index. 

 

We should clarify the spec to make this clear

Reporter: Andrew Lamb / @alamb Assignee: Andrew Lamb / @alamb

PRs and other links:

Note: This issue was originally created as PARQUET-2473. Please see the migration documentation for further details.