apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.48k stars 1.37k forks source link

Why doesn't Parquet currently support writing multiple row groups simultaneously? #2929

Open muyihao opened 1 week ago

muyihao commented 1 week ago

Hi Parquet developers,

I have a question regarding the current implementation of Parquet. As far as I understand, Parquet does not support writing multiple row groups simultaneously. Could you please explain the reasoning behind this design choice?

Additionally, I am considering modifying Parquet to allow for multiple row groups to exist in memory and be flushed sequentially. From a high-level perspective, does this approach seem feasible? Are there any potential pitfalls or challenges I should be aware of?

Thank you for your time and assistance.

Best regards,

wgtmac commented 1 week ago

This would complicate the implementation and result in large memory footprint. Does it make sense to use multiple file writers instead?