Open snakingfire opened 1 week ago
cc @jorisvandenbossche @raulcd
really weird:
item_builder_->length() 19456153
key_builder_->length(): 19456154
I'll have to debug a bunch to understand where this mismatch is coming from :)
Intuitively, I think what happens is that the item_builder_
overflows because it's a StringBuilder
and we try to append more than 2 GiB to it. The converter logic then tries to finish the chunk and start another one, but the key and item builders are out of sync.
It looks like the rewind-on-overflow in arrow/util/converter.h
is too naive. In particular, if appending to one of a StructBuilder
's child builders raises CapacityError
, then all child builders should be rewind to the same length to ensure consistency.
Describe the bug, including details regarding any error messages, version, and platform.
Related to https://github.com/apache/arrow/issues/44640
When attempting to convert a pandas dataframe that has a dict type column to a pyarrow table with a map column, if the dataframe and column are of sufficient size, the conversion fails with:
This is immediately followed by SIGABRT and the process crashing.
When the dataframe is of a smaller size, the conversion succeeds without error. See below for reproduction code, when
dataframe_size
is set to a small value (eg 1M rows) there is no error, but at a certain size (eg, 10M rows) the error condition occurs.Environment Details:
Component(s)
Python