apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
823 stars 163 forks source link

ScanExec metrics do not get reported in Spark UI for aggregate, join, sort, etc #1110

Open andygrove opened 15 hours ago

andygrove commented 15 hours ago

Describe the bug

Here is the native plan for a join. The join metrics of build_time and join_time get reported in the Spark UI but we do not report the metrics for fetching the input batches from the JVM or for unpacking dictionaries and performing deep copies where needed.

For this example it means we are reporting a time of ~410ms when the actual time is closer to ~600ms, and this is just for one partition.

HashJoinExec: metrics=[build_time=400.827077ms, join_time=8.557039ms]
  CopyExec [UnpackOrDeepCopy], metrics=[elapsed_compute=18.643737ms]
    ScanExec: source=[ShuffleQueryStage], metrics=[elapsed_compute=186.719525ms]
  CopyExec [UnpackOrDeepCopy], metrics=[..., elapsed_compute=293.113µs]
    ScanExec: source=[ShuffleQueryStage ...], metrics=[elapsed_compute=5.906924ms]

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response