Closed yma11 closed 1 week ago
@yma11 can you add a UI chart for the pyarrow UDF? Also add some implementation details?
In theory we can convert Velox to Arrow in Velox pipeline, then pass the arrow pointer to Spark where it's send to python process. There is no C2R and R2C in the whole process and no memcpy between Velox and Spark. Can we achieve this?
@yma11 can you add a UI chart for the pyarrow UDF? Also add some implementation details?
In theory we can convert Velox to Arrow in Velox pipeline, then pass the arrow pointer to Spark where it's send to python process. There is no C2R and R2C in the whole process and no memcpy between Velox and Spark. Can we achieve this?
Yes. There is no C2R and R2C in current implementation. There is a VeloxColumnar to Arrow only. But for memcpy, it depends on the arrow bridge. I found there are still some memory allocation at velox for data types like string. Let me add the implementation under the feature track.
@yma11 can you add a UI chart for the pyarrow UDF? Also add some implementation details? In theory we can convert Velox to Arrow in Velox pipeline, then pass the arrow pointer to Spark where it's send to python process. There is no C2R and R2C in the whole process and no memcpy between Velox and Spark. Can we achieve this?
Yes. There is no C2R and R2C in current implementation. There is a VeloxColumnar to Arrow only. But for memcpy, it depends on the arrow bridge. I found there are still some memory allocation at velox for data types like string. Let me add the implementation under the feature track.
@FelixYBW The implementation details are now added in 5461. Perf data is also wrapped there. FYI.
I just noticed that this file (ColumnarArrowEvalPythonExec.scala)'s package is
package org.apache.spark.api.python
which is wrong. Would you like to fix it? @yma11
Fixed.
@zhztheplayer Please help take a look again. Thanks.
What changes were proposed in this pull request?
Add metric for ColumnarArrowEvalPythonExec
(Fixes: #5771)
Spark UI
How was this patch tested?
We tested performance of arrow udf and collected some performance:
The perf shows ~20% perf gain compared with vanilla spark.