apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
447 stars 100 forks source link

Improve CometBroadcastHashJoin statistics #338

Closed planga82 closed 2 weeks ago

planga82 commented 2 weeks ago

What is the problem the feature request solves?

Add all statistics HashJoinExec datafusion node provides.

Describe the potential solution

Override metrics map in CometBroadcastHashJoin to provide all available metrics

Additional context

Current metrics:

output_rows
Elapsed_compute

All available metrics

/// Total time for collecting build-side of join
pub(crate) build_time: metrics::Time
/// Number of batches consumed by build-side
pub(crate) build_input_batches: metrics::Count,
/// Number of rows consumed by build-side
pub(crate) build_input_rows: metrics::Count,
/// Memory used by build-side in bytes
pub(crate) build_mem_used: metrics::Gauge,
/// Total time for joining probe-side batches to the build-side batches
pub(crate) join_time: metrics::Time,
/// Number of batches consumed by probe-side of this operator
pub(crate) input_batches: metrics::Count,
/// Number of rows consumed by probe-side this operator
pub(crate) input_rows: metrics::Count,
/// Number of batches produced by this operator
pub(crate) output_batches: metrics::Count,
/// Number of rows produced by this operator
pub(crate) output_rows: metrics::Count