NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
53 stars 37 forks source link

[FEA] Use environment variables to set thresholds in static yaml configurations #1387

Closed amahussein closed 1 week ago

amahussein commented 2 weeks ago

Is your feature request related to a problem? Please describe.

Currently, we have some thresholds hardcoded in the yaml configurations of the qualification/profiling tool. Recently, the daily QA tests are failing because the totalCoreSecThreshold defined in qualification-conf.yaml is too strict which causes all the eventlogs to be filtered out. In general, we should be able:

Describe the solution you'd like

Modify the tools to parse the yaml file after resolving the environment variables. The variable will use the default if no env_variable is defined. This will be done for both spillThresholdBytes, and totalCoreSecThreshold For example:

local:
  output:
     topCandidates:
        spillBased:
           spillThresholdBytes:  !ENV ${RAPIDS_USER_TOOLS_SPILL_BYTES_THRESHOLD:10737418240}
local:
  output:
     topCandidates:
        additionalHeuristics:
          totalCoreSecThreshold: !ENV ${RAPIDS_USER_TOOLS_CORE_SECONDS_THRESHOLD:691200}

The ENV_VAR names follow the same pattern commonly used "RAPIDS_USERTOOLS*". If we want that shorter, I like to change all other env_variables.

CC: @mattahrens @tgravescs

amahussein commented 2 weeks ago

@parthosa I have a fix for this. We will need to add those environment variables in the CI-CD/QA/unit tests if we want to specify different environmental settings