Open ExpandingMan opened 6 years ago
AFAIK Feather/Arrow intentionally uses Int32
to force people to use multiple blocks to store large arrays. I don't really understand that reasoning, but it's stated here:
https://github.com/apache/arrow/blob/master/format/Layout.md#array-lengths
The error occurs here on dataframes with sufficiently large columns. The most obvious way to fix this would be to change all of the offsets to Int64, but does the feather format even support that? Is this a fundamental limitation? If so that seems really bad, because the dataset was only about 20GB.