MAAP-Project / Community

Issue for MAAP (Zenhub)
2 stars 1 forks source link

Add Assumed Role to DPS and ADE for s3 DAAC access #996

Closed bsatoriu closed 3 months ago

bsatoriu commented 4 months ago

Update the AWS config files in MAAP to support the "maap-data-reader" assumed role. This will allow DPS jobs and workspaces to access certain DAAC buckets without requiring credentials or manual token refreshing.

This document describes the current method in use: https://docs.maap-project.org/en/latest/technical_tutorials/access/direct_access.html. With the new assumed role configuration in place, we will be able to do the following:

    s3_fsspec = fsspec.filesystem("s3", profile="maap-data-reader")
    s3_rasterio = rasterio.Env(AWSSession(profile_name="maap-data-reader"))
s_disc_object = "s3://gesdisc-cumulus-prod-protected/Landslide/Global_Landslide_Nowcast.1.1/2020/Global_Landslide_Nowcast_v1.1_20201231.tif"

    with s3_fsspec.open(ges_disc_object) as obj:
        data_array = rioxarray.open_rasterio(obj)

Tasks

bsatoriu commented 4 months ago

I investigated using environment variables for adding assumed role support to the ADE. Eclipse Che's recommended approach for adding cluster-wide env vars to all workspaces is to mount configmaps (see: https://eclipse.dev/che/docs/stable/end-user-guide/mounting-configmaps/).

However, AWS does not allow assumed role configuration to be set purely from env vars, so I created a separate ticket to address the ADE scenario: https://github.com/MAAP-Project/Community/issues/1001

In the meantime, the assumed role may be added manually to all of a user's workspace by adding the necessary aws config entry:

mkdir -p ~/.aws && cat >> ~/.aws/config <<CONFIG
[profile maap-data-reader]
region = us-west-2
role_arn = {arn}
credential_source = Ec2InstanceMetadata
CONFIG
chuckwondo commented 4 months ago

Actually, it's unnecessary. The info is automatically detected from the EC2 metadata, so it's not even necessary on the DPS images either.

I was able to successfully generate creds automatically (boto3 auto detects the EC2 role metadata) in the ADE without configuring that profile.

bsatoriu commented 4 months ago

@chuckwondo can you provide sample code using boto3? We should add that to the docs if it's not already there.

chuckwondo commented 4 months ago

@bsatoriu, sorry, I was tricked by the GEDI data. It appears that it's unnecessary for the GEDI collections because I think some special permissions were put in place specifically for us to access them directly. I see this is not the case otherwise.

Here's an example for other things though, assuming the maap-data-reader is put in place as described above:

import rioxarray
import s3fs

fs = s3fs.S3FileSystem(profile="maap-data-reader")
url = "s3://gesdisc-cumulus-prod-protected/Landslide/Global_Landslide_Nowcast.1.1/2020/Global_Landslide_Nowcast_v1.1_20201231.tif"

with (fs.open(url) as tif, rioxarray.open_rasterio(tif) as da):
    print(da)

# <xarray.DataArray (band: 1, y: 14400, x: 43200)> Size: 622MB
# [622080000 values with dtype=uint8]
# Coordinates:
#   * band         (band) int64 8B 1
#   * x            (x) float64 346kB -180.0 -180.0 -180.0 ... 180.0 180.0 180.0
#   * y            (y) float64 115kB 60.0 59.99 59.98 ... -59.98 -59.99 -60.0
#     spatial_ref  int64 8B 0
# Attributes:
#     AREA_OR_POINT:       Area
#     STATISTICS_MAXIMUM:  2
#     STATISTICS_MEAN:     nan
#     STATISTICS_MINIMUM:  0
#     STATISTICS_STDDEV:   nan
#     _FillValue:          255
#     scale_factor:        1.0
#     add_offset:          0.0
chuckwondo commented 4 months ago

FYI, when logging is configured like so, preceding the code to open the file, we can also see some interesting log messages:

import logging

import rioxarray
import s3fs

logging.basicConfig(level=logging.INFO)

fs = s3fs.S3FileSystem(profile="maap-data-reader")
url = "s3://gesdisc-cumulus-prod-protected/Landslide/Global_Landslide_Nowcast.1.1/2020/Global_Landslide_Nowcast_v1.1_20201231.tif"

with (fs.open(url) as tif, rioxarray.open_rasterio(tif) as da):
    print(da)

This will print the following log messages:

INFO:aiobotocore.credentials:Found credentials from IAM Role: MAAP-ADE-K8S
INFO:rasterio._filepath:Object not found in virtual filesystem: filename=b'e0270ba2-e13b-4812-ae91-39d990dbe3e1/e0270ba2-e13b-4812-ae91-39d990dbe3e1.aux'
INFO:rasterio._filepath:Object not found in virtual filesystem: filename=b'e0270ba2-e13b-4812-ae91-39d990dbe3e1/e0270ba2-e13b-4812-ae91-39d990dbe3e1.AUX'
INFO:rasterio._filepath:Object not found in virtual filesystem: filename=b'e0270ba2-e13b-4812-ae91-39d990dbe3e1/e0270ba2-e13b-4812-ae91-39d990dbe3e1.aux'
INFO:rasterio._filepath:Object not found in virtual filesystem: filename=b'e0270ba2-e13b-4812-ae91-39d990dbe3e1/e0270ba2-e13b-4812-ae91-39d990dbe3e1.AUX'