apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
613 stars 113 forks source link

How to use/test Comet? #20

Closed jinwenjie123 closed 3 months ago

jinwenjie123 commented 4 months ago

Hi Team,

I am trying to evaluate the performance of using the comet plugin. But I did not find any documentations about how to use the comet plugin after compiling it. Specifically, how to use it in the cluster mode.

Thanks

viirya commented 4 months ago

I think we will provide more documents on this.

Currently, https://github.com/apache/arrow-datafusion-comet/blob/main/bin/comet-spark-shell contains the simply usage of how to come out necessary configuration in Spark side to use Comet.

You just need to build Comet, distribute the jar with Spark and set up necessary configs to trigger it.

Btw, we haven't open source some features which are performance related. So the performance number you run might be not really accurate.

jinwenjie123 commented 4 months ago

Hi Team,

I also noticed that when comet detects unsupported features and fallback to Spark engine. Does it come with the cost of extra time to convert columnar data to row based data format ?

Thank you for your time.

viirya commented 4 months ago

Yes, that's correct. At the boundary between Comet operator and Spark operator, we need ColumnarToRow operator to convert from ColumnarBatch to Spark InternalRow which takes some time cost. Overall we expect that the gain obtained from native operators can be bigger than these cost. And with more native operator support, we can reduce such fallback and the cost of columnar to row conversion.

jinwenjie123 commented 3 months ago

Hi Team,

I am wondering will we have a documentation about what kinda of data types are supported by Comet/Datafusion (Like, Decimal(16, 6), interger .... ) and will not fallback to vanilla spark and cause the regression.

Or where I can lookup to find related information. Since this is very important while we are trying to evaluate whether to use Comet or not.

Thank you so much !

viirya commented 3 months ago

Hello, I answered in #64.

comphead commented 3 months ago

Can it be closed via https://github.com/apache/arrow-datafusion-comet/pull/125? @viirya

viirya commented 3 months ago

Okay

viirya commented 3 months ago

@jinwenjie123 Feel free to open new issues on Comet usage if you still have other questions. Thanks.