aws-controllers-k8s / community

AWS Controllers for Kubernetes (ACK) is a project enabling you to manage AWS services from Kubernetes
https://aws-controllers-k8s.github.io/community/
Apache License 2.0
2.44k stars 258 forks source link

Sagemaker ACK Fails to update endpoint #1889

Open mwm5945 opened 1 year ago

mwm5945 commented 1 year ago

Describe the bug Related to this issue in the CDK: https://github.com/aws/aws-cdk/issues/11594, it appears that updating an existing endpoint with a new Endpoint may require contradictory IAM permissions. Updating the endpointConfigName field in an existing endpoint yields this error for me:

  - message: "AccessDeniedException: User: arn:aws:sts::<acct omitted>:assumed-role/sagemaker-provisioner/kiam-kiam
      is not authorized to perform: sagemaker:UpdateEndpoint on resource: arn:aws:sagemaker:us-east-1:<account omitted>:endpoint-config/endpoint-config-name      because no identity-based policy allows the sagemaker:UpdateEndpoint action\n\tstatus
      code: 400, request id: <omitted> "
    status: "True"
    type: ACK.Recoverable

According to this doc, all UpdateEndpoint requires is to specify an endpoint name, which due to internal corporate policies is required. We are not able to add any EndpointConfigs to the policy due to the same policy.

Steps to reproduce

IAM policy scoped as much as possible:

        {
            "Sid": "endpoint",
            "Effect": "Allow",
            "Action": [
                "sagemaker:AddTags",
                "sagemaker:DeleteTags",
                "sagemaker:CreateEndpoint",
                "sagemaker:DeleteEndpoint",
                "sagemaker:DescribeEndpoint",
                "sagemaker:UpdateEndpoint",
                "sagemaker:UpdateEndpointWeightsAndCapacities"
            ],
            "Resource": [
                "arn:aws:sagemaker:us-east-1:ACCOUNT_NUM:endpoint/test-model",
            ]
        },
        {
            "Sid": "endpointCfg",
            "Effect": "Allow",
            "Action": [
                "sagemaker:AddTags",
                "sagemaker:DeleteTags",
                "sagemaker:CreateEndpointConfig",
                "sagemaker:CreateEndpoint",
                "sagemaker:DescribeEndpointConfig",
                "sagemaker:DeleteEndpointConfig"
            ],
            "Resource": [
                "arn:aws:sagemaker:us-east-1:ACCOUNT_NUM:endpoint-config/cfg1",
                "arn:aws:sagemaker:us-east-1:ACCOUNT_NUM:endpoint-config/cfg2"   
            ]
        },

Create the above resources, with the endpoint using cfg1, then try switching to cfg2 by updating the existing endpoint yaml.

Expected outcome A concise description of what you expected to happen.

Environment

a-hilaly commented 1 year ago

/cc @aws-controllers-k8s/sagemaker-maintainer

ananth102 commented 1 year ago

Hi mwm5945, will attempt to replicate but have a couple questions:

  1. Which controller verison are you using?
  2. Is arn:aws:sts::<acct omitted>:assumed-role/sagemaker-provisioner/kiam-kiam the ack controller role or the execution role?
  3. Do you create/remove tags in the update?
  4. Does the error go away if you have sagemaker:UpdateEndpoint in the endpointCfg statement?
mwm5945 commented 1 year ago
  1. 1.2.2
  2. Its the KIAM role that the ACK role has a trust relationship with (we're not on AKS, nor do we have the newer auth method setup yet)
  3. Nope!
  4. We're not able to do so--our internal corporate policies restrict adding this statement to endpint-configs, as it's not listed as an option here. I know doing this would work, as it worked previously, however there was a bug in the platform that handles policy validations, which is ultimately what caused this to be discovered.

Thanks!

surajkota commented 1 year ago

Hi Micheal, We are checking with the service team on this issue

surajkota commented 1 year ago

Hi Micheal, I can confirm this is a documentation issue and sagemaker:updateEndpoint permission needs to be on the endpoint config resource as well. We will work with the documentation team to update the docs.

ack-bot commented 8 months ago

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

gecube commented 8 months ago

/remove-lifecycle stale

ack-bot commented 2 months ago

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

gecube commented 2 months ago

/remove-lifecycle stale