Open pacman82 opened 4 days ago
The i16 is actually limit enforced by the parquet format itself - https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L940
Row groups of this size are such a bad idea, the format actively prevents it 😅
That being said we could make this an error not a panic
No disagreement here. I am exploring opportunities to change the UX of odbc2parquet
in a way to avoid this scenario entirely, but still felt that the panic should be an error.
Describe the bug
i16 counting row groups overflows and becomes negative causing panic
To Reproduce
Writing 32769 row groups with the file writer
Expected behavior
Maybe an error indicating that too many batches have been written would be preferable. Alternatively it would be nice if this just worked, yet I could also get behind the thinking that this may be too many row groups for a single file anyway.
Additional context
Occurred in the context of a user running
odbc2parquet
. His row groups were very small (15 rows) due to an issue with his row sizes, causing him to write lots of row groups into a single file. See: https://github.com/pacman82/odbc2parquet/issues/652