MAAP-Project / maap-documentation

9 stars 12 forks source link

Clarification on S3 Bucket Access options #383

Open wildintellect opened 4 months ago

wildintellect commented 4 months ago

When accessing a protected bucket, like all NASA DAACs, that require authentication (EDL), to get a short term S3 session, you must make sure not to use the environment option: os.environ['AWS_NO_SIGN_REQUEST'] = 'YES' This option is only for some public buckets that specifically do not accept account information. If you accidentally include this, tools like rasterio will skip your AWS Session.

We need to add a note to pages that use rio.env https://docs.maap-project.org/en/latest/search.html?q=rio.env&check_keywords=yes&area=default

And possibly make a clearer general page. I noticed https://docs.maap-project.org/en/latest/technical_tutorials/access/lpdaac_gedi_access.html only talks about GEDI S3 but we don't have a page about all EarthDataCloud data.

@phabs can provide some code examples

pahbs commented 4 months ago

Here is a reproducible example of how to access an ORNL DAAC dataset:



import rasterio as rio
import boto3
from maap.maap import MAAP
maap = MAAP(maap_host='api.maap-project.org')

def get_aws_session_DAAC(creds):
    """Create a Rasterio AWS Session with Credentials"""
    #creds = maap.aws.earthdata_s3_credentials('https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials')
    boto3_session = boto3.Session(
        aws_access_key_id=creds['accessKeyId'], 
        aws_secret_access_key=creds['secretAccessKey'],
        aws_session_token=creds['sessionToken'],
        region_name='us-west-2'
    )
    return AWSSession(boto3_session)

# URL of DAAC s3 file
s3_url = 's3://ornl-cumulus-prod-protected/above/DeciduousFractionl_CanopyCover/data/deciduousfraction_2015_prediction.tif'

os.environ['AWS_NO_SIGN_REQUEST'] = 'NO'

rio_env_session = rio.Env(get_aws_session_DAAC(maap.aws.earthdata_s3_credentials('https://data.ornldaac.earthdata.nasa.gov/s3credentials')))

with rio_env_session:
    with rasterio.open(s3_url, mode='r') as dataset:
        print(dataset.profile)
smk0033 commented 3 months ago

@wildintellect in those notebooks, I'm adding the environment variable and setting it to 'NO' like above and making an additional note to set it to no/not include it at all or the data cannot be accessed - is that fine for the users?

wildintellect commented 3 months ago

We probably need a new page specifically about S3 access, where we discuss when to use or not to use. Then can reference any relevant page to it?