earth-mover / icechunk

Open-source, cloud-native transactional tensor storage engine
https://icechunk.io
Apache License 2.0
143 stars 9 forks source link

Create a store using current AWS role attached to process #292

Open abarciauskas-bgse opened 2 hours ago

abarciauskas-bgse commented 2 hours ago

👋🏽 I am trying to create an Icechunk store using the current role of the ec2 instance I am running the following notebook in (which is a 2i2c JupyterHub for VEDA), and finding that using s3_from_env may not work with IAM-role based access, as I am getting an expired token error (when no token is being passed, the role itself should be used for access, not temporary tokens). But I could be doing something wrong.

See https://gist.github.com/abarciauskas-bgse/e4d27a8d41dd887657d04a544122c64a for a MRP. Note: this assumes you are using some role which has access to nasa-veda-scratch. Also note that it does work when you use S3 credentials generated using boto3 from the current session.

Any ideas @paraseba @sharkinsspatial @maxrjones

rabernat commented 2 hours ago

Thanks for sharing Aimee! I don't have a solution for you right away, but I can say that that THIS:

using the current role of the ec2 instance I am running the following notebook in (which is a 2i2c JupyterHub for VEDA) ... using s3_from_env

is exactly how we use Icechunk internally at Earthmover (except it's our own JupyterHub rather than a 2i2c one). It just works and I don't have to use the .get_frozen_credentials() workaround from your gist.

@mpiannucci may have ideas...

maxrjones commented 1 hour ago

@abarciauskas-bgse the following works for me on the VEDA Hub. I wonder if this is actually an issue with configuring the bucket through both the storage and config parameters rather than directly related to auth.

import icechunk
import zarr

storage_config = icechunk.StorageConfig.s3_from_env(
    bucket="nasa-veda-scratch",
    prefix="icechunk/test-mursst",
    region="us-west-2"
)
store = icechunk.IcechunkStore.create(storage_config)
group = zarr.group(store)
array = group.create("my_array", shape=10, dtype=int)
array[:] = 1
store.commit("first commit")
paraseba commented 1 hour ago

I wonder if this instance has the AWS_SESSION_TOKEN environment variable set, or some other similar one, and somehow the aws cli manages to ignore it.