NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
56 stars 38 forks source link

Use environment variables to set thresholds in static yaml configurations #1389

Closed amahussein closed 1 month ago

amahussein commented 1 month ago

Signed-off-by: Ahmed Hussein ahussein@nvidia.com

Fixes #1387

This code change aims at parsing yaml configuration file and resolving the environment variables.

These changes improve the flexibility and maintainability of the configuration management in the user_tools package.

Usage:

export RAPIDS_USER_TOOLS_CORE_SECONDS_THRESHOLD=1024
spark_rapids <args>

Code changes

This pull request introduces several changes to the user_tools package, mainly focusing on dependency updates and configuration management enhancements. The key changes include updating dependencies in pyproject.toml, integrating the pyaml_env library for environment variable parsing in YAML files, and modifying configuration files to support environment variable substitution.

Documentation Update

Filed an issue internally to add the new env_variables to the documentation

Dependency updates and integration:

Code changes for environment variable parsing:

Configuration file enhancements:

amahussein commented 1 month ago

CC: @viadea FYI to control spillThreshold or coreSeconds in a specific environment.

amahussein commented 1 month ago

overall looks good. are we documenting these for user, or mean for more advanced configuration?

It is meant for advanced configuration. We have an internal issue opened to document all the environment-variables. Thus, if users need specific tuning for their environment we can point them to the env-variables.