Open yyang52 opened 1 year ago
@oerling Any thoughts on that? Thanks!
CC: @JkSelf
Can you try to see if this PR#4854 can solve your problem? It seems that the performance degradation caused by frequent allocate and copy in the reallocate method of DataBuffer.
Thanks for providing that! Seems like we encounter a similar issue, as I also found reservce
method took a lot of time. Will try this PR to see if the problem could be fixed.
Can you try to see if this PR#4854 can solve your problem? It seems that the performance degradation caused by frequent allocate and copy in the reallocate method of DataBuffer.
I have tried this PR and it does solve the problem! The hotspot becomes compression as memmove
doesn't take that much time.
Description
ETL (Velox w/ Spark/Gluten) is an essential part for data analytics which involves data files writing. Currently, Velox implements Parquet writer with Arrow writer, which supports various compression codecs. For ETL workload, ZSTD is a commonly-used compression method. And we had some benchmarks/tests to check the hotspot and see if some optimization potentials existed.
Since Velox doesn't have a parquet writer benchmark, we implement a simple benchmark to write parquet files with TpchGen tables. While the workload profiling showed that the hotspots lay on
__memmove_avx_unaligned_erms
(> 50%), whileZSTD_compress
only takes less than 10% time. While benchmarks on Arrow side gave a higher percentage ofZSTD_compress
(more than 15% and > 80% when runningcolumn_io_benchmark
)Velox workload:
Arrow workload:
![image](https://github.com/facebookincubator/velox/assets/82208254/e3bfd63f-8261-400b-920e-608104627bca)
Not sure if that's due to the different implementations of memory pool or I'm using some improper workloads. Do we happen to have some profiling data on parquet writer or is there any plan to optimize this part with SW/HW accelerators?