awslabs / s3-connector-for-pytorch

The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access and store data in Amazon S3.
BSD 3-Clause "New" or "Revised" License
103 stars 11 forks source link

role_arn with web_identity_token_file compatibility #210

Open harsh-agarwal-93 opened 1 month ago

harsh-agarwal-93 commented 1 month ago

Tell us more about this new feature.

I'm working in a Kubernetes/Kubflow environment deployed on EC2 with S3 credentials set up using role_arn with web_identity_token_file and keep receiving the S3Exception: Client error: No signing credentials found error. I wanted to know if the connector works with these credentials.

I reviewed this closed issue which led me here which led me here. Using this information I set up my ~/.aws/config to match and created an environment variable AWS_PROFILE set to "web-identity" to use the profile by default. This did not work for the connector (v1.2.3) but boto3 did not have any issues with the same setup. The example code used for testing is listed below.

# In ~/.aws/config

[profile web-identity]
role_arn=arn:aws:iam:123456789012:role/RoleNameToAssume
web_identity_token_file=/path/to/a/token
import s3torchconnector

import os
import boto3
import botocore
import io
import itertools
from PIL import Image
import torch
import torchdata
import torchvision
import webdataset

#Testing in a jupyter notebook and using notebook magic to set the environment variable
%set_env AWS_PROFILE=web-identity
%env AWS_PROFILE
!aws sts get-caller-identity
!aws s3 ls
# Prints environment variable
# Lists identity
# Lists Buckets

IMAGES_URI = <DATASET URI>
REGION = <REGION>
dataset = s3torchconnector.S3MapDataset.from_prefix(IMAGES_URI, region=REGION)

object = dataset[42]
# S3Exception: Client error: No signing credentials found

s3 = boto3.resource('s3')
for bucket in s3.buckets.all():
    print(bucket.name)
# Lists Buckets
IsaevIlya commented 2 weeks ago

Hello @harsh-agarwal-93, Thank you for your interest in Amazon S3 Connector for PyTorch. The issue you're facing seems to be related to the one reported at https://github.com/awslabs/mountpoint-s3/issues/675. In that case, setting the AWS_DEFAULT_REGION environment variable helped resolve the problem. You could try the same approach and set the AWS_DEFAULT_REGION environment variable to the appropriate AWS region for your use case. If this solution doesn't work for you, kindly inform us, and we'll explore further alternatives.