Closed asfimport closed 5 months ago
Several implementors have reported that the parquet spec is currently unclear as to when repeated fields can span page boundaries (aka can a logical record be split across a page and/or row group boundary)
Discussion on list: https://lists.apache.org/thread/rd8twnvg4bg3558r507rzpxckcxt5wdn
The conclusion seems to be that the records can't be split across boundaries for "v2 data pages" or if there is a page index.
We should clarify the spec to make this clear
Reporter: Andrew Lamb / @alamb Assignee: Andrew Lamb / @alamb
Note: This issue was originally created as PARQUET-2473. Please see the migration documentation for further details.
Several implementors have reported that the parquet spec is currently unclear as to when repeated fields can span page boundaries (aka can a logical record be split across a page and/or row group boundary)
Discussion on list: https://lists.apache.org/thread/rd8twnvg4bg3558r507rzpxckcxt5wdn
The conclusion seems to be that the records can't be split across boundaries for "v2 data pages" or if there is a page index.
We should clarify the spec to make this clear
Reporter: Andrew Lamb / @alamb Assignee: Andrew Lamb / @alamb
PRs and other links:
Note: This issue was originally created as PARQUET-2473. Please see the migration documentation for further details.