Fix DisassociateTrialComponent throttling,
since we already do disassociate and delete trial_component with code tc.delete(force_disassociate=True). So t.remove_trial_component(tc) is not needed.
/opt/conda/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
315 # The "self" in this scope is referring to the BaseClient.
--> 316 return self._make_api_call(operation_name, kwargs)
317
ClientError: An error occurred (ThrottlingException) when calling the DisassociateTrialComponent operation (reached max retries: 4): Rate exceeded
The above exception was the direct cause of the following exception:
Exception Traceback (most recent call last)
in
----> 1 my_experiment.delete_all(action="--force")
/opt/conda/lib/python3.7/site-packages/smexperiments/experiment.py in delete_all(self, action)
248 while True:
249 if delete_count == 3:
--> 250 raise Exception("Fail to delete, please try again.") from last_exception
251 try:
252 for trial_summary in self.list_trials():
Exception: Fail to delete, please try again.
Test
Tested the delete_all() with autoPilot in sagemaker studio, was able to delete the experiment and related t/tc.
input:
from smexperiments.experiment import Experiment
from smexperiments.trial import Trial
from smexperiments.trial_component import TrialComponent
from smexperiments.tracker import Tracker
import time
def cleanup(experiment):
delete_count = 0
last_exception = None
while True:
if delete_count == 3:
raise Exception("Fail to delete, please try again.") from last_exception
try:
for trial_summary in experiment.list_trials():
t = Trial.load(
sagemaker_boto_client=experiment.sagemaker_boto_client, trial_name=trial_summary.trial_name
)
for trial_component_summary in t.list_trial_components():
tc = TrialComponent.load(
sagemaker_boto_client=experiment.sagemaker_boto_client,
trial_component_name=trial_component_summary.trial_component_name,
)
tc.delete(force_disassociate=True)
to prevent throttling
time.sleep(1.2)
t.delete()
experiment.delete()
break
except Exception as ex:
last_exception = ex
finally:
delete_count = delete_count + 1
SIM: https://sim.amazon.com/issues/AML-78535
Fix DisassociateTrialComponent throttling, since we already do disassociate and delete trial_component with code tc.delete(force_disassociate=True). So t.remove_trial_component(tc) is not needed.
add sleep time under trial and experiment also contain 1s throttle time. add a time.sleep between experiment and trial deletion. Based on https://tiny.amazon.com/ftd9gdrx/codeamazpackIronbloba686conf
Error:
Test
Tested the delete_all() with autoPilot in sagemaker studio, was able to delete the experiment and related t/tc.
input:
output: