Closed bsatoriu closed 3 months ago
I investigated using environment variables for adding assumed role support to the ADE. Eclipse Che's recommended approach for adding cluster-wide env vars to all workspaces is to mount configmaps (see: https://eclipse.dev/che/docs/stable/end-user-guide/mounting-configmaps/).
However, AWS does not allow assumed role configuration to be set purely from env vars, so I created a separate ticket to address the ADE scenario: https://github.com/MAAP-Project/Community/issues/1001
In the meantime, the assumed role may be added manually to all of a user's workspace by adding the necessary aws config entry:
mkdir -p ~/.aws && cat >> ~/.aws/config <<CONFIG
[profile maap-data-reader]
region = us-west-2
role_arn = {arn}
credential_source = Ec2InstanceMetadata
CONFIG
Actually, it's unnecessary. The info is automatically detected from the EC2 metadata, so it's not even necessary on the DPS images either.
I was able to successfully generate creds automatically (boto3 auto detects the EC2 role metadata) in the ADE without configuring that profile.
@chuckwondo can you provide sample code using boto3? We should add that to the docs if it's not already there.
@bsatoriu, sorry, I was tricked by the GEDI data. It appears that it's unnecessary for the GEDI collections because I think some special permissions were put in place specifically for us to access them directly. I see this is not the case otherwise.
Here's an example for other things though, assuming the maap-data-reader
is put in place as described above:
import rioxarray
import s3fs
fs = s3fs.S3FileSystem(profile="maap-data-reader")
url = "s3://gesdisc-cumulus-prod-protected/Landslide/Global_Landslide_Nowcast.1.1/2020/Global_Landslide_Nowcast_v1.1_20201231.tif"
with (fs.open(url) as tif, rioxarray.open_rasterio(tif) as da):
print(da)
# <xarray.DataArray (band: 1, y: 14400, x: 43200)> Size: 622MB
# [622080000 values with dtype=uint8]
# Coordinates:
# * band (band) int64 8B 1
# * x (x) float64 346kB -180.0 -180.0 -180.0 ... 180.0 180.0 180.0
# * y (y) float64 115kB 60.0 59.99 59.98 ... -59.98 -59.99 -60.0
# spatial_ref int64 8B 0
# Attributes:
# AREA_OR_POINT: Area
# STATISTICS_MAXIMUM: 2
# STATISTICS_MEAN: nan
# STATISTICS_MINIMUM: 0
# STATISTICS_STDDEV: nan
# _FillValue: 255
# scale_factor: 1.0
# add_offset: 0.0
FYI, when logging is configured like so, preceding the code to open the file, we can also see some interesting log messages:
import logging
import rioxarray
import s3fs
logging.basicConfig(level=logging.INFO)
fs = s3fs.S3FileSystem(profile="maap-data-reader")
url = "s3://gesdisc-cumulus-prod-protected/Landslide/Global_Landslide_Nowcast.1.1/2020/Global_Landslide_Nowcast_v1.1_20201231.tif"
with (fs.open(url) as tif, rioxarray.open_rasterio(tif) as da):
print(da)
This will print the following log messages:
INFO:aiobotocore.credentials:Found credentials from IAM Role: MAAP-ADE-K8S
INFO:rasterio._filepath:Object not found in virtual filesystem: filename=b'e0270ba2-e13b-4812-ae91-39d990dbe3e1/e0270ba2-e13b-4812-ae91-39d990dbe3e1.aux'
INFO:rasterio._filepath:Object not found in virtual filesystem: filename=b'e0270ba2-e13b-4812-ae91-39d990dbe3e1/e0270ba2-e13b-4812-ae91-39d990dbe3e1.AUX'
INFO:rasterio._filepath:Object not found in virtual filesystem: filename=b'e0270ba2-e13b-4812-ae91-39d990dbe3e1/e0270ba2-e13b-4812-ae91-39d990dbe3e1.aux'
INFO:rasterio._filepath:Object not found in virtual filesystem: filename=b'e0270ba2-e13b-4812-ae91-39d990dbe3e1/e0270ba2-e13b-4812-ae91-39d990dbe3e1.AUX'
Update the AWS config files in MAAP to support the "maap-data-reader" assumed role. This will allow DPS jobs and workspaces to access certain DAAC buckets without requiring credentials or manual token refreshing.
This document describes the current method in use: https://docs.maap-project.org/en/latest/technical_tutorials/access/direct_access.html. With the new assumed role configuration in place, we will be able to do the following:
Tasks
Add assumed role support to ADE(new ticket: https://github.com/MAAP-Project/Community/issues/1001)