apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.1k stars 834 forks source link

[spark] Integrate paimon scan metrics into spark scan #3616

Closed Zouxxyy closed 2 days ago

Zouxxyy commented 6 days ago

Purpose

Integrate paimon scan metrics into spark scan, so that we can get more information on the UI. Note: these new metrics are only available from Spark 3.4, because reportDriverMetrics was added in 3.4.

e.g.

select count(*) from inventory where inv_date_sk = 2450822
image

Tests

API and Format

Documentation

Zouxxyy commented 4 days ago

Hi, @schnappi17 can you help with a review~ Not sure if this is the correct way to use MetricRegistry, because the metrics of spark and flink are different.