apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
823 stars 163 forks source link

Reduce metrics collection overhead #1024

Open mbutrovich opened 1 month ago

mbutrovich commented 1 month ago

What is the problem the feature request solves?

Running TPC-H locally, I see >3% of on-CPU time spent in comet::execution::metrics::utils::update_comet_metric. This function appears to be called whenever native execution wakes up in the polling loop, typically to produce a batch. Starting from the root of the plan, the preorder traversal behavior is:

Describe the potential solution

There are a few things to explore:

  1. Does reducing the granularity of metrics updates affect the correctness of these metrics? If not, we could update metrics less frequently.
  2. Can we eliminate the overhead of repeatedly allocating strings via JNI for every metric? Addressed in #1029.
  3. Can we update an entire node's metrics with a single JNI call, rather than a JNI call for each metric?

Additional context

No response