aws-samples / amazon-braket-experiment-tracking-with-sagemaker

MIT No Attribution
0 stars 0 forks source link

Insufficient permissions to clean up experiment in notebook #4

Open licedric opened 2 months ago

licedric commented 2 months ago

When running the notebook, I run into this error when running the last cell to clean up:

---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
File ~/anaconda3/envs/Braket/lib/python3.10/site-packages/smexperiments/experiment.py:274, in Experiment.delete_all(self, action)
    270 tc = trial_component.TrialComponent.load(
    271     sagemaker_boto_client=self.sagemaker_boto_client,
    272     trial_component_name=trial_component_summary.trial_component_name,
    273 )
--> 274 tc.delete(force_disassociate=True)
    275 # to prevent throttling

File ~/anaconda3/envs/Braket/lib/python3.10/site-packages/smexperiments/trial_component.py:128, in TrialComponent.delete(self, force_disassociate)
    127 else:
--> 128     list_trials_response = self.sagemaker_boto_client.list_trials(
    129         TrialComponentName=self.trial_component_name
    130     )
    132 # Disassociate the trials and trial components

File ~/anaconda3/envs/Braket/lib/python3.10/site-packages/botocore/client.py:569, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
    568 # The "self" in this scope is referring to the BaseClient.
--> 569 return self._make_api_call(operation_name, kwargs)

File ~/anaconda3/envs/Braket/lib/python3.10/site-packages/botocore/client.py:1023, in BaseClient._make_api_call(self, operation_name, api_params)
   1022     error_class = self.exceptions.from_code(error_code)
-> 1023     raise error_class(parsed_response, operation_name)
   1024 else:

ClientError: An error occurred (AccessDeniedException) when calling the ListTrials operation: User: arn:aws:sts::<ACCOUNT_ID>:assumed-role/AmazonBraketServiceSageMakerNotebookRole/SageMaker is not authorized to perform: sagemaker:ListTrials on resource: arn:aws:sagemaker:us-east-1:<ACCOUNT_ID>:experiment-trial-component/TrialComponent-2024-09-18-192731-cikn because no identity-based policy allows the sagemaker:ListTrials action

The above exception was the direct cause of the following exception:

Exception                                 Traceback (most recent call last)
Cell In[31], line 1
----> 1 vqc_experiment.delete_all(action="--force")

File ~/anaconda3/envs/Braket/lib/python3.10/site-packages/smexperiments/experiment.py:263, in Experiment.delete_all(self, action)
    261 while True:
    262     if delete_attempt_count == self.MAX_DELETE_ALL_ATTEMPTS:
--> 263         raise Exception("Failed to delete, please try again.") from last_exception
    264     try:
    265         for trial_summary in self.list_trials():

Exception: Failed to delete, please try again.

Seems like cleaning up uses the ListTrials action, and the provided inline policy only allows arns containing amazon-braket.

peterkomar-aws commented 1 month ago

Confirmed. Changing the inline permission's resource from "arn:aws:sagemaker:*:*:*amazon-braket-*" to "arn:aws:sagemaker:*:*:*experiment-*" in L22 and L34 of the policy makes the cleanup step complete successfully.

Here is the full policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "searchpermissions",
            "Effect": "Allow",
            "Action": [
                "sagemaker:Search"
            ],
            "Resource": "*"
        },
        {
            "Sid": "experimentpermissions",
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreateExperiment",
                "sagemaker:DeleteExperiment",
                "sagemaker:DescribeExperiment",
                "sagemaker:ListExperiments",
                "sagemaker:UpdateExperiment"
            ],
            "Resource": "arn:aws:sagemaker:*:*:*experiment-*"
        },
        {
            "Sid": "trialpermissions",
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreateTrial",
                "sagemaker:DeleteTrial",
                "sagemaker:DescribeTrial",
                "sagemaker:ListTrials",
                "sagemaker:UpdateTrial"
            ],
            "Resource": "arn:aws:sagemaker:*:*:*experiment-*"
        },
        {
            "Sid": "trialcomponentpermissions",
            "Effect": "Allow",
            "Action": [
                "sagemaker:AssociateTrialComponent",
                "sagemaker:CreateTrialComponent",
                "sagemaker:DeleteTrialComponent",
                "sagemaker:DescribeTrialComponent",
                "sagemaker:DisassociateTrialComponent",
                "sagemaker:ListTrialComponents",
                "sagemaker:UpdateTrialComponent"
            ],
            "Resource": "arn:aws:sagemaker:*:*:experiment-*"
        }
    ]
}