aws-samples / amazon-sagemaker-local-mode

Amazon SageMaker Local Mode Examples
MIT No Attribution
242 stars 59 forks source link

Error in doing Training Job says S3 Object Forbidden #9

Closed orriduck closed 2 years ago

orriduck commented 3 years ago

Hi,

@eitansela

I am attempt to create a training job in local mode which uses the s3 data to do the training, in this case I didn't change the local session config, however I have encountered s3 forbidden error which I don't know why.

Here is my code snippet for doing the training job

# Training Job
salary_estimator = TensorFlow(
    entry_point='train.py',
    source_dir="../ds_pipeline/src", 
    role=sagemaker.get_execution_role(), # current notebook role
    image_uri="763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.3.1-gpu-py37-cu110-ubuntu18.04",
    instance_count=1,
    instance_type="local",
    output_path = "s3://sagemaker-project-p-zfuf9hgaujxu/experiment_packs/poc_exp/model",
    sagemaker_session=sagemaker_session, # LocalSession()
    container_log_level=10, # 10 debug 20 info 30 warning 40 error
    volume_size=80,
    model_dir=False,
    hyperparameters = {
        "default": {
            "train_epochs": 3,
            "train_batch_size": 1024,
            "early_stop_tolerance": 2
        },
        "CA": {
            "train_epochs": 5,
            "train_batch_size": 2048,
            "early_stop_tolerance": 2
        }
    }
)

salary_estimator.fit(
    inputs = {
        "train": TrainingInput(
            s3_data="s3://sagemaker-project-p-zfuf9hgaujxu/experiment_packs/poc_exp/feature_engineering/encoded_train",
            content_type=None,
        ),
        "validation": TrainingInput(
            s3_data="s3://sagemaker-project-p-zfuf9hgaujxu/experiment_packs/poc_exp/feature_engineering/encoded_validation",
            content_type=None,
        ),
        "encoders": TrainingInput(
            s3_data="s3://sagemaker-project-p-zfuf9hgaujxu/experiment_packs/poc_exp/feature_engineering/encoders",
            content_type=None,
        ),
    }
)

And the log for execution, seems the error comes from it try to upload something not to download the data, I have double checked that the data is in s3 bucket, can somebody give me some help here?

Creating xtwk0cmckh-algo-1-p7p3n ... 
Creating xtwk0cmckh-algo-1-p7p3n ... done
Attaching to xtwk0cmckh-algo-1-p7p3n
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:00.639920: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializing the SageMaker Profiler.
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:00.640071: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:105] SageMaker Profiler is not enabled. The timeline writer thread will not be started, future recorded events will be dropped.
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:00.646929: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:00.678365: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializing the SageMaker Profiler.
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,185 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,192 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,193 botocore.hooks DEBUG    Changing event name from creating-client-class.iot-data to creating-client-class.iot-data-plane
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,195 botocore.hooks DEBUG    Changing event name from before-call.apigateway to before-call.api-gateway
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,196 botocore.hooks DEBUG    Changing event name from request-created.machinelearning.Predict to request-created.machine-learning.Predict
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,198 botocore.hooks DEBUG    Changing event name from before-parameter-build.autoscaling.CreateLaunchConfiguration to before-parameter-build.auto-scaling.CreateLaunchConfiguration
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,198 botocore.hooks DEBUG    Changing event name from before-parameter-build.route53 to before-parameter-build.route-53
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,198 botocore.hooks DEBUG    Changing event name from request-created.cloudsearchdomain.Search to request-created.cloudsearch-domain.Search
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,199 botocore.hooks DEBUG    Changing event name from docs.*.autoscaling.CreateLaunchConfiguration.complete-section to docs.*.auto-scaling.CreateLaunchConfiguration.complete-section
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,202 botocore.hooks DEBUG    Changing event name from before-parameter-build.logs.CreateExportTask to before-parameter-build.cloudwatch-logs.CreateExportTask
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,202 botocore.hooks DEBUG    Changing event name from docs.*.logs.CreateExportTask.complete-section to docs.*.cloudwatch-logs.CreateExportTask.complete-section
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,203 botocore.hooks DEBUG    Changing event name from before-parameter-build.cloudsearchdomain.Search to before-parameter-build.cloudsearch-domain.Search
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,203 botocore.hooks DEBUG    Changing event name from docs.*.cloudsearchdomain.Search.complete-section to docs.*.cloudsearch-domain.Search.complete-section
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,208 botocore.hooks DEBUG    Changing event name from creating-client-class.iot-data to creating-client-class.iot-data-plane
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,209 botocore.hooks DEBUG    Changing event name from before-call.apigateway to before-call.api-gateway
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,210 botocore.hooks DEBUG    Changing event name from request-created.machinelearning.Predict to request-created.machine-learning.Predict
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,212 botocore.hooks DEBUG    Changing event name from before-parameter-build.autoscaling.CreateLaunchConfiguration to before-parameter-build.auto-scaling.CreateLaunchConfiguration
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,212 botocore.hooks DEBUG    Changing event name from before-parameter-build.route53 to before-parameter-build.route-53
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,213 botocore.hooks DEBUG    Changing event name from request-created.cloudsearchdomain.Search to request-created.cloudsearch-domain.Search
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,213 botocore.hooks DEBUG    Changing event name from docs.*.autoscaling.CreateLaunchConfiguration.complete-section to docs.*.auto-scaling.CreateLaunchConfiguration.complete-section
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,216 botocore.hooks DEBUG    Changing event name from before-parameter-build.logs.CreateExportTask to before-parameter-build.cloudwatch-logs.CreateExportTask
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,216 botocore.hooks DEBUG    Changing event name from docs.*.logs.CreateExportTask.complete-section to docs.*.cloudwatch-logs.CreateExportTask.complete-section
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,216 botocore.hooks DEBUG    Changing event name from before-parameter-build.cloudsearchdomain.Search to before-parameter-build.cloudsearch-domain.Search
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,217 botocore.hooks DEBUG    Changing event name from docs.*.cloudsearchdomain.Search.complete-section to docs.*.cloudsearch-domain.Search.complete-section
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,227 botocore.loaders DEBUG    Loading JSON file: /usr/local/lib/python3.7/site-packages/boto3/data/s3/2006-03-01/resources-1.json
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,230 botocore.utils DEBUG    IMDS ENDPOINT: http://169.254.169.254/
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,232 botocore.credentials DEBUG    Looking for credentials via: env
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,232 botocore.credentials DEBUG    Looking for credentials via: assume-role
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,232 botocore.credentials DEBUG    Looking for credentials via: assume-role-with-web-identity
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,232 botocore.credentials DEBUG    Looking for credentials via: sso
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,232 botocore.credentials DEBUG    Looking for credentials via: shared-credentials-file
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,233 botocore.credentials DEBUG    Looking for credentials via: custom-process
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,233 botocore.credentials DEBUG    Looking for credentials via: config-file
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,233 botocore.credentials DEBUG    Looking for credentials via: ec2-credentials-file
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,233 botocore.credentials DEBUG    Looking for credentials via: boto-config
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,233 botocore.credentials DEBUG    Looking for credentials via: container-role
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,233 botocore.credentials DEBUG    Looking for credentials via: iam-role
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:02,234 urllib3.connectionpool DEBUG    Starting new HTTP connection (1): 169.254.169.254:80
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,237 urllib3.connectionpool DEBUG    Starting new HTTP connection (2): 169.254.169.254:80
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,238 urllib3.connectionpool DEBUG    http://169.254.169.254:80 "GET /latest/meta-data/iam/security-credentials/ HTTP/1.1" 200 35
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,239 urllib3.connectionpool DEBUG    Resetting dropped connection: 169.254.169.254
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,240 urllib3.connectionpool DEBUG    http://169.254.169.254:80 "GET /latest/meta-data/iam/security-credentials/BaseNotebookInstanceEc2InstanceRole HTTP/1.1" 200 1298
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,240 botocore.credentials DEBUG    Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,241 botocore.loaders DEBUG    Loading JSON file: /usr/local/lib/python3.7/site-packages/botocore/data/endpoints.json
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,247 botocore.hooks DEBUG    Event choose-service-name: calling handler <function handle_service_name_alias at 0x7f097cadd560>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,256 botocore.loaders DEBUG    Loading JSON file: /usr/local/lib/python3.7/site-packages/botocore/data/s3/2006-03-01/service-2.json
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,266 botocore.hooks DEBUG    Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7f097cb103b0>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,266 botocore.hooks DEBUG    Event creating-client-class.s3: calling handler <function lazy_call.<locals>._handler at 0x7f097abc1ef0>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,267 botocore.hooks DEBUG    Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7f097cb10170>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,269 botocore.endpoint DEBUG    Setting s3 timeout as (60, 60)
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,270 botocore.loaders DEBUG    Loading JSON file: /usr/local/lib/python3.7/site-packages/botocore/data/_retry.json
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,270 botocore.client DEBUG    Registering retry handlers for service: s3
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,271 boto3.resources.factory DEBUG    Loading s3:s3
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,273 boto3.resources.factory DEBUG    Loading s3:Bucket
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,273 boto3.resources.model DEBUG    Renaming Bucket attribute name
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,274 botocore.hooks DEBUG    Event creating-resource-class.s3.Bucket: calling handler <function lazy_call.<locals>._handler at 0x7f0910603f80>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,275 s3transfer.utils DEBUG    Acquiring 0
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,275 s3transfer.tasks DEBUG    DownloadSubmissionTask(transfer_id=0, {'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f096fb0fd90>}) about to wait for the following futures []
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,276 s3transfer.tasks DEBUG    DownloadSubmissionTask(transfer_id=0, {'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f096fb0fd90>}) done waiting for dependent futures
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,276 s3transfer.tasks DEBUG    Executing task DownloadSubmissionTask(transfer_id=0, {'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f096fb0fd90>}) with kwargs {'client': <botocore.client.S3 object at 0x7f096fb26a90>, 'config': <boto3.s3.transfer.TransferConfig object at 0x7f096faf9e10>, 'osutil': <s3transfer.utils.OSUtils object at 0x7f096faf9c90>, 'request_executor': <s3transfer.futures.BoundedExecutor object at 0x7f096fb0f890>, 'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f096fb0fd90>, 'io_executor': <s3transfer.futures.BoundedExecutor object at 0x7f096fb0fad0>}
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,276 botocore.hooks DEBUG    Event before-parameter-build.s3.HeadObject: calling handler <function sse_md5 at 0x7f097ca90b00>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,276 botocore.hooks DEBUG    Event before-parameter-build.s3.HeadObject: calling handler <function validate_bucket_name at 0x7f097ca90a70>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,276 botocore.hooks DEBUG    Event before-parameter-build.s3.HeadObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7f096fadfe10>>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,276 botocore.hooks DEBUG    Event before-parameter-build.s3.HeadObject: calling handler <bound method S3ArnParamHandler.handle_arn of <botocore.utils.S3ArnParamHandler object at 0x7f096fb2e690>>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,276 botocore.hooks DEBUG    Event before-parameter-build.s3.HeadObject: calling handler <function generate_idempotent_uuid at 0x7f097ca908c0>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,277 botocore.hooks DEBUG    Event before-call.s3.HeadObject: calling handler <function add_expect_header at 0x7f097ca90dd0>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,277 botocore.hooks DEBUG    Event before-call.s3.HeadObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7f096fadfe10>>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,277 botocore.hooks DEBUG    Event before-call.s3.HeadObject: calling handler <function inject_api_version_header_if_needed at 0x7f097ca9a170>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,277 botocore.endpoint DEBUG    Making request for OperationModel(name=HeadObject) with params: {'url_path': '/sagemaker-project-p-zfuf9hgaujxu/tensorflow-training-2021-04-19-23-05-49-157/source/sourcedir.tar.gz', 'query_string': {}, 'method': 'HEAD', 'headers': {'User-Agent': 'Boto3/1.17.11 Python/3.7.10 Linux/4.14.225-121.357.amzn1.x86_64 Botocore/1.20.11 Resource'}, 'body': b'', 'url': 'https://s3.amazonaws.com/sagemaker-project-p-zfuf9hgaujxu/tensorflow-training-2021-04-19-23-05-49-157/source/sourcedir.tar.gz', 'context': {'client_region': 'us-east-1', 'client_config': <botocore.config.Config object at 0x7f096fb2e390>, 'has_streaming_input': False, 'auth_type': None, 'signing': {'bucket': 'sagemaker-project-p-zfuf9hgaujxu'}}}
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,278 botocore.hooks DEBUG    Event request-created.s3.HeadObject: calling handler <function signal_not_transferring at 0x7f097ac9ac20>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,278 botocore.hooks DEBUG    Event request-created.s3.HeadObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7f096fb2e350>>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,278 botocore.hooks DEBUG    Event choose-signer.s3.HeadObject: calling handler <bound method ClientCreator._default_s3_presign_to_sigv2 of <botocore.client.ClientCreator object at 0x7f097ac27610>>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,279 botocore.hooks DEBUG    Event choose-signer.s3.HeadObject: calling handler <function set_operation_specific_signer at 0x7f097ca907a0>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,279 botocore.hooks DEBUG    Event before-sign.s3.HeadObject: calling handler <bound method S3EndpointSetter.set_endpoint of <botocore.utils.S3EndpointSetter object at 0x7f096fae3750>>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,279 botocore.utils DEBUG    Defaulting to S3 virtual host style addressing with path style addressing fallback.
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,279 botocore.utils DEBUG    Checking for DNS compatible bucket for: https://s3.amazonaws.com/sagemaker-project-p-zfuf9hgaujxu/tensorflow-training-2021-04-19-23-05-49-157/source/sourcedir.tar.gz
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,279 botocore.utils DEBUG    URI updated to: https://sagemaker-project-p-zfuf9hgaujxu.s3.amazonaws.com/tensorflow-training-2021-04-19-23-05-49-157/source/sourcedir.tar.gz
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,280 botocore.auth DEBUG    Calculating signature using v4 auth.
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,280 botocore.auth DEBUG    CanonicalRequest:
xtwk0cmckh-algo-1-p7p3n | HEAD
xtwk0cmckh-algo-1-p7p3n | /tensorflow-training-2021-04-19-23-05-49-157/source/sourcedir.tar.gz
xtwk0cmckh-algo-1-p7p3n | 
xtwk0cmckh-algo-1-p7p3n | host:sagemaker-project-p-zfuf9hgaujxu.s3.amazonaws.com
xtwk0cmckh-algo-1-p7p3n | x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
xtwk0cmckh-algo-1-p7p3n | x-amz-date:20210419T230703Z
xtwk0cmckh-algo-1-p7p3n | x-amz-security-token:IQoJb3JpZ2luX2VjEBcaCXVzLWVhc3QtMSJHMEUCIQChpLqiTKZ/X9vtF5ivLL3kTZ/N9Cvo5cxJJqjvbGEEWAIgNkkix5K7Kp4/t4iph3c+Wa/Xn+yEte1vlOZfIEJknQgqtAMIfxACGgw3NjQ3MDc5MjQ0MTUiDKfb16hiXol9Xwz5qSqRA6iUy73MDy2i8r1mLSIc7hwxEaN2gh3CNHdWOUpDqF4mJQiuEdqgwB+dyULqgObrNwSJHj5v2GZlzoMxiiDGw+Drhe6Eh9WiLK666hoBTdOyMWjk5jI5k4HNfNQi9OT6eL+5I1xPZOSzscK7yBVZ0nHiIG/IQi0s4dqTqEybghURN73H1pV/ilEeoIaOMT01SUaZ8o45g6hhdwiOFFfOEF64G+smUV2SgY5bYGhd/lJwegZolz3PsC6otG37ySqtJ09YLGqlGf9cGCkZWdtCMfzWzsf/YpKePS+V8li4LquOUSm0rpsLyjZbDqYSmZP/Swny1pZ7C4pXQS0/gO4XDxeRNXgZ4N1dPVX51tn9HcxDXVqn+DlSLVNmLG8mQr5sC20NSlaRUdv9W4dr5PcbzW2+J+QfV/1wGd/G+DM7nrNTaAyIJZOJS6i0prkCc84v3AXRzEUoRUMqk9TIw7cHtqGINhoXGbQovokM65jlOcuOihuIBOC9HiBE6M6Z810yf3E/oDHu885zMdnRaMQg7hsFMOqX+IMGOusBcdY4qXVQpFO1nIUEWeSMf7+FEfX2dhOGX+OGvIH6kKJTf+nl/iy65z9gzOlFsT4jq2EcZaIvzSm7TC/La4nemDM7VsSzWyJL/u5Dv1R5felIvvSh07/Tu/6OxeMZ3uCruUQQJLaszLPYJNWngva0GqSumdlRcINJCyIi/HH2AAfP0lajB9FfJglcSs60UPuFqJ8mPRQYEB8wxl9+VoWF5vcwnKSHbNJ2To0wgWsNXZEyQraKQASmzSKlHW5Ji1Wkhpd9MWoEKmD0UMYtSbLiOKyLXxwu3xDMyXW0gZG3m0D8+gzo2FlbvCvvzg==
xtwk0cmckh-algo-1-p7p3n | 
xtwk0cmckh-algo-1-p7p3n | host;x-amz-content-sha256;x-amz-date;x-amz-security-token
xtwk0cmckh-algo-1-p7p3n | e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,280 botocore.auth DEBUG    StringToSign:
xtwk0cmckh-algo-1-p7p3n | AWS4-HMAC-SHA256
xtwk0cmckh-algo-1-p7p3n | 20210419T230703Z
xtwk0cmckh-algo-1-p7p3n | 20210419/us-east-1/s3/aws4_request
xtwk0cmckh-algo-1-p7p3n | 3c5b8f418f49ceb997bd97a31db7533de585ac50c6aa411842af35bc5b553aae
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,280 botocore.auth DEBUG    Signature:
xtwk0cmckh-algo-1-p7p3n | b67362f3ecfa2a620da5dd3088cd997a202e729985abdc574534c5643bddcdcd
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,280 botocore.hooks DEBUG    Event request-created.s3.HeadObject: calling handler <function signal_transferring at 0x7f097aca8f80>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,281 botocore.endpoint DEBUG    Sending http request: <AWSPreparedRequest stream_output=False, method=HEAD, url=https://sagemaker-project-p-zfuf9hgaujxu.s3.amazonaws.com/tensorflow-training-2021-04-19-23-05-49-157/source/sourcedir.tar.gz, headers={'User-Agent': b'Boto3/1.17.11 Python/3.7.10 Linux/4.14.225-121.357.amzn1.x86_64 Botocore/1.20.11 Resource', 'X-Amz-Date': b'20210419T230703Z', 'X-Amz-Security-Token': b'IQoJb3JpZ2luX2VjEBcaCXVzLWVhc3QtMSJHMEUCIQChpLqiTKZ/X9vtF5ivLL3kTZ/N9Cvo5cxJJqjvbGEEWAIgNkkix5K7Kp4/t4iph3c+Wa/Xn+yEte1vlOZfIEJknQgqtAMIfxACGgw3NjQ3MDc5MjQ0MTUiDKfb16hiXol9Xwz5qSqRA6iUy73MDy2i8r1mLSIc7hwxEaN2gh3CNHdWOUpDqF4mJQiuEdqgwB+dyULqgObrNwSJHj5v2GZlzoMxiiDGw+Drhe6Eh9WiLK666hoBTdOyMWjk5jI5k4HNfNQi9OT6eL+5I1xPZOSzscK7yBVZ0nHiIG/IQi0s4dqTqEybghURN73H1pV/ilEeoIaOMT01SUaZ8o45g6hhdwiOFFfOEF64G+smUV2SgY5bYGhd/lJwegZolz3PsC6otG37ySqtJ09YLGqlGf9cGCkZWdtCMfzWzsf/YpKePS+V8li4LquOUSm0rpsLyjZbDqYSmZP/Swny1pZ7C4pXQS0/gO4XDxeRNXgZ4N1dPVX51tn9HcxDXVqn+DlSLVNmLG8mQr5sC20NSlaRUdv9W4dr5PcbzW2+J+QfV/1wGd/G+DM7nrNTaAyIJZOJS6i0prkCc84v3AXRzEUoRUMqk9TIw7cHtqGINhoXGbQovokM65jlOcuOihuIBOC9HiBE6M6Z810yf3E/oDHu885zMdnRaMQg7hsFMOqX+IMGOusBcdY4qXVQpFO1nIUEWeSMf7+FEfX2dhOGX+OGvIH6kKJTf+nl/iy65z9gzOlFsT4jq2EcZaIvzSm7TC/La4nemDM7VsSzWyJL/u5Dv1R5felIvvSh07/Tu/6OxeMZ3uCruUQQJLaszLPYJNWngva0GqSumdlRcINJCyIi/HH2AAfP0lajB9FfJglcSs60UPuFqJ8mPRQYEB8wxl9+VoWF5vcwnKSHbNJ2To0wgWsNXZEyQraKQASmzSKlHW5Ji1Wkhpd9MWoEKmD0UMYtSbLiOKyLXxwu3xDMyXW0gZG3m0D8+gzo2FlbvCvvzg==', 'X-Amz-Content-SHA256': b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', 'Authorization': b'AWS4-HMAC-SHA256 Credential=ASIA3EDBE5G7T46NPRPT/20210419/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=b67362f3ecfa2a620da5dd3088cd997a202e729985abdc574534c5643bddcdcd'}>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,281 botocore.httpsession DEBUG    Certificate path: /usr/local/lib/python3.7/site-packages/certifi/cacert.pem
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,282 urllib3.connectionpool DEBUG    Starting new HTTPS connection (1): sagemaker-project-p-zfuf9hgaujxu.s3.amazonaws.com:443
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,318 urllib3.connectionpool DEBUG    https://sagemaker-project-p-zfuf9hgaujxu.s3.amazonaws.com:443 "HEAD /tensorflow-training-2021-04-19-23-05-49-157/source/sourcedir.tar.gz HTTP/1.1" 403 0
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,319 botocore.parsers DEBUG    Response headers: {'x-amz-request-id': 'QYZH5WHTRGRRT2S3', 'x-amz-id-2': 'eCmVBmTrL/nGujzuR7ZguCjnDe2r0cXXaCH1iQPLGdnJRAMJfE5g+ZmT/a7Z6eGCB3fR0e4XZnw=', 'Content-Type': 'application/xml', 'Date': 'Mon, 19 Apr 2021 23:07:02 GMT', 'Server': 'AmazonS3'}
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,319 botocore.parsers DEBUG    Response body:
xtwk0cmckh-algo-1-p7p3n | b''
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,321 botocore.hooks DEBUG    Event needs-retry.s3.HeadObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f096fb2e7d0>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,321 botocore.retryhandler DEBUG    No retry needed.
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,321 botocore.hooks DEBUG    Event needs-retry.s3.HeadObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f096fadfe10>>
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,322 s3transfer.tasks DEBUG    Exception raised.
xtwk0cmckh-algo-1-p7p3n | Traceback (most recent call last):
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/s3transfer/tasks.py", line 255, in _main
xtwk0cmckh-algo-1-p7p3n |     self._submit(transfer_future=transfer_future, **kwargs)
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/s3transfer/download.py", line 343, in _submit
xtwk0cmckh-algo-1-p7p3n |     **transfer_future.meta.call_args.extra_args
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
xtwk0cmckh-algo-1-p7p3n |     return self._make_api_call(operation_name, kwargs)
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 676, in _make_api_call
xtwk0cmckh-algo-1-p7p3n |     raise error_class(parsed_response, operation_name)
xtwk0cmckh-algo-1-p7p3n | botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,323 s3transfer.utils DEBUG    Releasing acquire 0/None
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,324 sagemaker-training-toolkit ERROR    Reporting training FAILURE
xtwk0cmckh-algo-1-p7p3n | 2021-04-19 23:07:03,324 sagemaker-training-toolkit ERROR    framework error: 
xtwk0cmckh-algo-1-p7p3n | Traceback (most recent call last):
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/sagemaker_training/trainer.py", line 85, in train
xtwk0cmckh-algo-1-p7p3n |     entrypoint()
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/sagemaker_tensorflow_container/training.py", line 235, in main
xtwk0cmckh-algo-1-p7p3n |     train(env, mapping.to_cmd_args(user_hyperparameters))
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/sagemaker_tensorflow_container/training.py", line 173, in train
xtwk0cmckh-algo-1-p7p3n |     runner_type=runner_type,
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/sagemaker_training/entry_point.py", line 92, in run
xtwk0cmckh-algo-1-p7p3n |     files.download_and_extract(uri=uri, path=environment.code_dir)
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/sagemaker_training/files.py", line 131, in download_and_extract
xtwk0cmckh-algo-1-p7p3n |     s3_download(uri, dst)
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/sagemaker_training/files.py", line 167, in s3_download
xtwk0cmckh-algo-1-p7p3n |     s3.Bucket(bucket).download_file(key, dst)
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/boto3/s3/inject.py", line 246, in bucket_download_file
xtwk0cmckh-algo-1-p7p3n |     ExtraArgs=ExtraArgs, Callback=Callback, Config=Config)
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/boto3/s3/inject.py", line 172, in download_file
xtwk0cmckh-algo-1-p7p3n |     extra_args=ExtraArgs, callback=Callback)
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/boto3/s3/transfer.py", line 307, in download_file
xtwk0cmckh-algo-1-p7p3n |     future.result()
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/s3transfer/futures.py", line 106, in result
xtwk0cmckh-algo-1-p7p3n |     return self._coordinator.result()
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/s3transfer/futures.py", line 265, in result
xtwk0cmckh-algo-1-p7p3n |     raise self._exception
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/s3transfer/tasks.py", line 255, in _main
xtwk0cmckh-algo-1-p7p3n |     self._submit(transfer_future=transfer_future, **kwargs)
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/s3transfer/download.py", line 343, in _submit
xtwk0cmckh-algo-1-p7p3n |     **transfer_future.meta.call_args.extra_args
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
xtwk0cmckh-algo-1-p7p3n |     return self._make_api_call(operation_name, kwargs)
xtwk0cmckh-algo-1-p7p3n |   File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 676, in _make_api_call
xtwk0cmckh-algo-1-p7p3n |     raise error_class(parsed_response, operation_name)
xtwk0cmckh-algo-1-p7p3n | botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
xtwk0cmckh-algo-1-p7p3n | 
xtwk0cmckh-algo-1-p7p3n | An error occurred (403) when calling the HeadObject operation: Forbidden
xtwk0cmckh-algo-1-p7p3n exited with code 1
1
Aborting on container exit...

I encountered the same error in all sagemaker[local] 2.31, 2.23.5, 2.35

eitansela commented 3 years ago

Hello @ruyyi0323 You have to have IAM role with access to the S3 bucket. Do you have the proper credentials configured on your PC?

orriduck commented 3 years ago

Hi @eitansela,

Thanks for the reply, I am using SageMaker Notebook Instance to do so. I guess it's using the notebook role since the role attribute that passed in is not in use? Is there any workaround idea?