Open cindyyuanjiang opened 2 weeks ago
Found conversion from nanoseconds to ms in TaskModel
construction: https://github.com/NVIDIA/spark-rapids-tools/blob/019ede2fdc87f109f895ce67161c506dd377d80a/core/src/main/scala/org/apache/spark/sql/rapids/tool/store/TaskModel.scala#L112
Describe the bug Profiler output shows inconsistent shuffle write write time results in
profiler.log
.Under
Stage Level All Metrics
, the total is 93 ms.Under
Stage level aggregated task metrics
, the total is 42ms.We suspect this is due to precision lost from converting nanoseconds to milliseconds. For
Stage Level All Metrics
, the results are in nanoseconds, and forStage level aggregated task metrics
, the results are in milliseconds. Most tasks in this stage has < 1,000,000 nanoseconds = 1 ms shuffle write time, which will rounds to 0 ms.Steps/Code to reproduce bug
spark_rapids profiling -v -e <my_tools_repo>/core/src/test/resources/spark-events-profiling/rapids_join_eventlog.zstd