NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
43 stars 34 forks source link

Refactor Databricks-AWS Qual tool to cache and process pricing info from DB website #1141

Closed cindyyuanjiang closed 4 days ago

cindyyuanjiang commented 1 week ago

Fixes https://github.com/NVIDIA/spark-rapids-tools/issues/1139.

Changes

Testing

spark_rapids qualification --eventlogs <my-event-logs> --platform databricks-aws --cluster <my-cluster-props>

Run the cmd above and confirm the pricing calculation is the same before and after this PR.

cindyyuanjiang commented 6 days ago

Have you tested the fat-mode build? We need to make sure that these changes work fine when the tools are running offline.

Thanks @amahussein! Tested the fat-mode build successfully.

cindyyuanjiang commented 5 days ago

@cindyyuanjiang General comment on styling: we are trying to enforce defining the return type of each function as much as possible. This is going to hunt us back with pylint moving forward which will cause the code to fail all pylint checks. For function returning nothing, then it is recommended to define it as def foo() -> None:

Thanks @amahussein! Updated function return types.