NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
49 stars 36 forks source link

Fix qualx app metrics #1102

Closed leewyang closed 3 months ago

leewyang commented 3 months ago

From @eordentlich, this PR fixes per-app aggregations, specifically for QX (raw xgboost) metrics on datasets where the sqlIDs may be misaligned.

Changes

  1. defer GPU appDuration calculations to per-app aggregations post prediction vs. carrying per-sql pre-prediction.
  2. add scaleFactor to the per-app aggregations.
  3. add description to the appName when overridding descriptions to allow CPU/GPU joining.
  4. fix loading model by name.

Test

Following CMDs have been tested:

Internal Usage:

python qualx_main.py predict
python qualx_main.py evaluate