aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.11k stars 1.13k forks source link

Sagemaker always try to create default bucket, even the custom default_bucket is already exists #1990

Closed Basanti-xian closed 2 years ago

Basanti-xian commented 4 years ago

Describe the bug Sagemaker session always try to create default bucket, even though I already given an existing customised bucket when init Sagemaker session. “errorMessage”: “Your previous request to create the named bucket succeeded and you already own it.“ Also, it happens continuously every few seconds.

To reproduce

import boto3
import sagemaker
from sagemaker.estimator import Estimator

bucket = "bucket-that-exists"
boto_session = boto3.session.Session(region_name="ap-southeast-1")

ml_estimator = Estimator(
    image_uri = container,
    role = role,
    sagemaker_session = sagemaker.session.Session(boto_session, default_bucket=bucket),
    **config["train_model"]["estimator_config"]
)

Expected behavior

Sagemaker should check the provided bucket is already exist and stop creating new.

Screenshot 2020-11-12 at 5 20 56 PM

However, bucket.creation_date returns None even when the bucket exists, and trying to create it again.

Screenshots or logs Events from cloudtrail:

“eventSource”: “s3.amazonaws.com”,
    “eventName”: “CreateBucket”,
    “awsRegion”: “ap-southeast-1",
    “sourceIPAddress”: “3.0.220.101",
    “userAgent”: “[Boto3/1.13.9 Python/3.8.5 Linux/4.14.198-152.320.amzn2.x86_64 exec-env/AWS_ECS_FARGATE Botocore/1.17.44 Resource]“,
    “errorCode”: “BucketAlreadyOwnedByYou”,
    “errorMessage”: “Your previous request to create the named bucket succeeded and you already own it.“,
    “requestParameters”: {
        “CreateBucketConfiguration”: {
            “LocationConstraint”: “ap-southeast-1",
            “xmlns”: “http://s3.amazonaws.com/doc/2006-03-01/”
        },
        “bucketName”: "bucket-that-exists",
        “Host”: xxx
    },

System information

chuyang-deng commented 4 years ago

Hi @Basanti-xian, are you running your SageMaker job in us-east-1?

If so, this is expected since create_bucket() returns a BucketAlreadyOwnedByYou error in all AWS regions EXCEPT in us-east-1, while in us-east-1 region you will get 200 OK, thus the Python SDK code will keep trying creating a default bucket.

jimmy-pelago commented 4 years ago

@ChuyangDeng why the if condition bucket.creation_date is always None. I expect it will not jump into the line s3.create_bucket regardless of the region That error event in the Cloudtrail log is very noisy

Basanti-xian commented 4 years ago

Hi @ChuyangDeng , the SageMaker job running in ap-southeast-1.

jcmcken commented 3 years ago

IMO, there should be an option (e.g. via an argument to Session) to disable this auto-bucket creation (including disabling looking for the bucket to see if it exists). It seems a little bit like conflating responsibilities. I don't necessarily want my Sagemaker workload to have permissions to create buckets (or even to list all buckets in the account). If there's some vulnerability in Sagemaker or in my code, I would rather limit the permissions as much as possible.

Geekiac commented 2 years ago

@Basanti-xian @chuyang-deng This is causing me the same issue. The bucket exists but the creation_date is returning None even though I can see the creation date in the console. It then tries to create it. I am in eu-west-1. I agree with @jcmcken that the creation should be optional not auto by default.

We don't have create bucket permissions, our buckets are created for us. So when it tries to create a bucket for us we get access denied.

Geekiac commented 2 years ago

Another option is to use the s3 head_bucket api call.

Signature: s3.head_bucket(*args, **kwargs) Docstring: This action is useful to determine if a bucket exists and you have permission to access it. The action returns a 200 OK if the bucket exists and you have permission to access it.

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.head_bucket

TremaMiguel commented 2 years ago

I'm facing the same issue and have consider three workarounds to this.

1. Rol has S3ListAllMyBuckets permission

The S3 bucket resource has a creation_date attribute, that gets set with S3.Bucket.load()

They may be set at creation time from the response of an action on another resource, or they may be set when accessed or via an explicit call to the load or reload action.

2. Code already exists in S3

If you provide an s3 location code_location where the code lives the attribute self.bucket is not None. This will avoid to call the method default_bucket from boto3.Session, an example is here.

3. Hardcode the value self.bucket

# Example with SKLearnModel
from sagemaker.sklearn.model import SKLearnModel
from sagemaker.session import Session

smk_session = Session(default_bucket='your-bucket')
model = SKLearnModel(
    ...,
    sagemaker_session=smk_session

)

# Harcode bucket attribute
model.bucket = 'your_bucket'
TremaMiguel commented 2 years ago

One feasible solution is to check access to _default_bucket_name_access here using boto3.head_bucket().

In this way neither you would need to have permission S3ListAllMyBuckets not to create an s3 bucket.

If you accept PR, I'm open to develop this.

import boto3

def _check_default_bucket_access(self):
    default_bucket = self._default_bucket_name_override
    if not default_bucket:
        return False

    client = self.boto_session.client('s3')
    response = client.head_bucket(
        Bucket=default_bucket,
    )
    status_code = response['ResponseMetaData']['HTTPStatusCode']
    if status_code == 200:
        return True
    elif status_code in [400, 403]:
        return False

def default_bucket(self):
      """Return the name of the default bucket to use in relevant Amazon SageMaker interactions.
      This function will create the s3 bucket if it does not exist.
      Returns:
          str: The name of the default bucket, which is of the form:
              ``sagemaker-{region}-{AWS account ID}``.
      """
      if self._default_bucket:
          return self._default_bucket

      default_bucket_access = self._check_default_bucket_access()
      if default_bucket_access:
          self._default_bucket = default_bucket
          return self._default_bucket
      ...

      return self._default_bucket
TremaMiguel commented 2 years ago

@ahsan-z-khan @staubhp could you provide insight to the discussion or know anyone from aws who can?

mufaddal-rohawala commented 2 years ago

The issue should be fixed now. Please use recent sagemaker-python-sdk >= v2.96.0

george-chenvibes commented 2 years ago

The issue should be fixed now. Please use recent sagemaker-python-sdk >= v2.96.0

I still get the same error despite upgrading sagemaker version to >=2.96.0. @mufaddal-rohawala. @TremaMiguel Did you ever find a workaround?

clausagerskov commented 1 year ago

@mufaddal-rohawala should be fixed, is not

clausagerskov commented 1 year ago

this bug fix listed in the release was to change a required "s3:ListAllMyBuckets" to a required "s3:ListBucket". that does not at all address the issue here, which is fully local development

MichaelWalker-git commented 1 year ago

Hi @Basanti-xian, are you running your SageMaker job in us-east-1?

If so, this is expected since create_bucket() returns a BucketAlreadyOwnedByYou error in all AWS regions EXCEPT in us-east-1, while in us-east-1 region you will get 200 OK, thus the Python SDK code will keep trying creating a default bucket.

Looking at the source code:

https://github.com/aws/sagemaker-python-sdk/blob/300cd17bc5ac2b1a7d900ebaca110eae4659cfdc/src/sagemaker/session.py#L6151

Can you override it with a session and declare the region there? https://docs.aws.amazon.com/braket/latest/developerguide/braket-using-boto3-profiles-step-2.html