Closed kuhushukla closed 2 months ago
Our approach to deal with this is to:
Finally, the divide-by-zero will be the symptom.
We need @kuhushukla 's help to reproduce it.
I have found a scenario that leads to a crash in the Qualification tool. This did not reproduce the divide-by-zero error as I expected.
Repro:
1) Remove instance type Standard_DS3_v2
from user_tools/src/spark_rapids_pytools/resources/premium-databricks-azure-catalog.json
2) Run cmd: spark_rapids_user_tools databricks-azure qualification -e <my-event-log> --cpu_cluster <my-cpu-cluster> --verbose
where <my-cpu-cluster>
has worker_node type Standard_DS3_v2
Stack-trace error:
2024-01-03 14:31:40,683 ERROR rapids.tools.price.Databricks-Azure: Could not find price for instance type 'Standard_DS3_v2': 'NoneType' object has no attribute 'get'
2024-01-03 14:31:40,683 ERROR root: Qualification. Raised an error in phase [Collecting-Results]
Traceback (most recent call last):
File "/home/cindyj/Desktop/spark-rapids-tools/user_tools/src/spark_rapids_pytools/rapids/rapids_tool.py", line 110, in wrapper
func_cb(self, *args, **kwargs) # pylint: disable=not-callable
File "/home/cindyj/Desktop/spark-rapids-tools/user_tools/src/spark_rapids_pytools/rapids/rapids_tool.py", line 242, in _collect_result
self._process_output()
File "/home/cindyj/Desktop/spark-rapids-tools/user_tools/src/spark_rapids_pytools/rapids/qualification.py", line 760, in _process_output
report_gen = self.__build_global_report_summary(df, csv_summary_file)
File "/home/cindyj/Desktop/spark-rapids-tools/user_tools/src/spark_rapids_pytools/rapids/qualification.py", line 662, in __build_global_report_summary
apps_working_set = self.__calc_apps_cost(apps_reshaped_df,
File "/home/cindyj/Desktop/spark-rapids-tools/user_tools/src/spark_rapids_pytools/rapids/qualification.py", line 616, in __calc_apps_cost
savings_estimator = self.ctxt.platform.create_saving_estimator(self.ctxt.get_ctxt('cpuClusterProxy'),
File "/home/cindyj/Desktop/spark-rapids-tools/user_tools/src/spark_rapids_pytools/cloud_api/databricks_azure.py", line 82, in create_saving_estimator
saving_estimator = DBAzureSavingsEstimator(price_provider=db_azure_price_provider,
File "<string>", line 9, in __init__
File "/home/cindyj/Desktop/spark-rapids-tools/user_tools/src/spark_rapids_pytools/pricing/price_provider.py", line 148, in __post_init__
self._setup_costs()
File "/home/cindyj/Desktop/spark-rapids-tools/user_tools/src/spark_rapids_pytools/pricing/price_provider.py", line 143, in _setup_costs
self.source_cost = self._get_cost_per_cluster(self.source_cluster)
File "/home/cindyj/Desktop/spark-rapids-tools/user_tools/src/spark_rapids_pytools/cloud_api/databricks_azure.py", line 410, in _get_cost_per_cluster
cost = self.price_provider.get_instance_price(instance=instance_type)
File "/home/cindyj/Desktop/spark-rapids-tools/user_tools/src/spark_rapids_pytools/pricing/databricks_azure_pricing.py", line 83, in get_instance_price
raise ex
File "/home/cindyj/Desktop/spark-rapids-tools/user_tools/src/spark_rapids_pytools/pricing/databricks_azure_pricing.py", line 79, in get_instance_price
rate_per_hour = instance_conf.get('TotalPricePerHour')
AttributeError: 'NoneType' object has no attribute 'get'
Could not produce it
Describe the bug The cpu cost division for the estimate can cause divide by zero error.
Steps/Code to reproduce bug Use an eventlog where costs are forced to be zero
Expected behavior Default to 0 and not cause divide by 0.