aws / sagemaker-experiments

Experiment tracking and metric logging for Amazon SageMaker notebooks and model training.
Apache License 2.0
126 stars 36 forks source link

Fix: fix throttling while calling disassociate trial component in aut… #89

Closed yzhu0 closed 4 years ago

yzhu0 commented 4 years ago

The service call throws throttling exception while calling the experiment.delete_all() in autoPilot. Add a sleep time to prevent the TCDisassociate throttling.

Throttle Error:

ClientError Traceback (most recent call last) /opt/conda/lib/python3.7/site-packages/smexperiments/experiment.py in delete_all(self, action) 260 ) --> 261 tc.delete(force_disassociate=True) 262 t.remove_trial_component(tc)

/opt/conda/lib/python3.7/site-packages/smexperiments/trial_component.py in delete(self, force_disassociate) 130 self.sagemaker_boto_client.disassociate_trial_component( --> 131 TrialName=trial["TrialName"], TrialComponentName=self.trial_component_name 132 )

/opt/conda/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs) 315 # The "self" in this scope is referring to the BaseClient. --> 316 return self._make_api_call(operation_name, kwargs) 317

/opt/conda/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params) 634 error_class = self.exceptions.from_code(error_code) --> 635 raise error_class(parsed_response, operation_name) 636 else:

ClientError: An error occurred (ThrottlingException) when calling the DisassociateTrialComponent operation (reached max retries: 4): Rate exceeded

The above exception was the direct cause of the following exception:

Exception Traceback (most recent call last)

in ----> 1 my_experiment.delete_all(action="--force") /opt/conda/lib/python3.7/site-packages/smexperiments/experiment.py in delete_all(self, action) 248 while True: 249 if delete_count == 3: --> 250 raise Exception("Fail to delete, please try again.") from last_exception 251 try: 252 for trial_summary in self.list_trials(): Exception: Fail to delete, please try again._

How to create experiment auto-pilot: https://aws.amazon.com/getting-started/hands-on/create-machine-learning-model-automatically-sagemaker-autopilot/