apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.62k stars 3.55k forks source link

[C++] Couldn't serialize thrift on parquet::arrow::FileWriter::Close #43333

Open ouwei-xhs opened 4 months ago

ouwei-xhs commented 4 months ago

Describe the usage question you have. Please include as many useful details as possible.

I've written a program to convert my data to parquet format. during my test, I've encountered such an error on FileWrite::Close

parquet_writer.cpp:186 Failed to close writer: IOError: Couldn't serialize thrift: Internal buffer size overflow

the corresponding implementation is:

image

it seems like the case mentioned in this issue

during my test, I've also found that the error is related to the column count. if the schema contains 8 columns, then it fails:

parquet_writer.cpp:172 The table constains: [8] columns and [70409] rows
parquet_writer.cpp:186 Failed to close writer: IOError: Couldn't serialize thrift: Internal buffer size overflow

but it contains less than 8, then it works well:

parquet_writer.cpp:172 The table constains: [7] columns and [70409] rows
processor.cpp:41 Try to upload parquet file 0.parquet, file size: 7072759

Is there any params to tune to solve this issue?

Component(s)

C++

pitrou commented 1 week ago

Which version of Arrow C++ are you using? Also, did you post the entire error message or did you truncate it?