apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.22k stars 437 forks source link

[VL] enhancement of microbenchmark #7953

Closed FelixYBW closed 3 hours ago

FelixYBW commented 1 week ago

Description

  1. Currently if a task failed, the reducer stopped to read data and output to parquet. So the reducer data isn't completed. We need a way to read full data once the partition is enabled.
  2. currently we filter sample by stage id and task id. we need to add filter by partition size as well. @marin-ma can we get the records number from driver before reducer read the data? I'd think so.
FelixYBW commented 2 days ago

@marin-ma figured out that if input data is the same, the partition id and partition data can be reproducable run2run. So we needn't 2. Partition id is the "index" column shown in UI.

Thanks @marin-ma