NASA-IMPACT / veda-jupyterhub

VEDA JupyterHub technical planning and documentation
0 stars 1 forks source link

Allow hub access to production veda data store #53

Closed anayeaye closed 4 days ago

anayeaye commented 1 month ago

What

Allow hubs to read production objects veda-data-store. We now have a stable production catalog and S3 data store and need to update our notebook examples to refer to the same data that users see in the dashboard.

Notes

In MCP I have updated the veda-data-store bucket policy to allow GetObject and ListBucket to these roles: "arn:aws:iam::444055461661:role/nasa-veda-prod", "arn:aws:iam::444055461661:role/nasa-veda-staging".

I think the hub has full Get, List, and Put set up for staging so the update might be here even though we do not want hub users to be able to Put in production (but the bucket will not allow that operation anyway): https://github.com/2i2c-org/infrastructure/blob/main/terraform/aws/projects/nasa-veda.tfvars#L47

AC

Testable with

The Download STAC Assets notebook should work using production STAC_API_URL = "https://openveda.cloud/api/stac" when run in the hub.

wildintellect commented 1 month ago

Minor: I don't think that notebook is the best test.

Easier test on the hub

$ rio cogeo info s3://veda-data-store/barc-thomasfire/thomas_fire_barc_201712.cog.tiff
WARNING:rasterio._env:CPLE_AppDefined in HTTP response code on https://veda-data-store.s3.amazonaws.com/barc-thomasfire/thomas_fire_barc_201712.cog.tiff: 403
s2n_init() failed: 402653198 (error opening urandom)
Fatal error condition occurred in /home/conda/feedstock_root/build_artifacts/tiledb_1708024446644/work/build/externals/src/ep_awssdk/crt/aws-crt-cpp/crt/aws-c-io/source/s2n/s2n_tls_channel_handler.c:203: 0 && "s2n_init() failed"
Exiting Application
################################################################################
Stack trace:
################################################################################

Another random item from a different collection

rio cogeo info s3://veda-data-store/bangladesh-landcover-2001-2020/MODIS_LC_2020_BD.cog.tif
WARNING:rasterio._env:CPLE_AppDefined in HTTP response code on https://veda-data-store.s3.amazonaws.com/bangladesh-landcover-2001-2020/MODIS_LC_2020_BD.cog.tif: 403
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 310, in rasterio._base.DatasetBase.__init__
  File "rasterio/_base.pyx", line 221, in rasterio._base.open_dataset
  File "rasterio/_err.pyx", line 221, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_AWSAccessDeniedError: Access Denied

@anayeaye it appears the bucket policy is not correct. Can you please share the policy internally (not on this ticket for review)

wildintellect commented 1 month ago

I've started a branch to add this to the 2i2c config. Question should ESDIS and GHG instance get the same bucket access @anayeaye? They all currently only have staging.

wildintellect commented 1 month ago

Correction the branch is https://github.com/NASA-IMPACT/veda-hub-infrastructure/tree/veda-data-store https://github.com/2i2c-org/infrastructure/pull/4533 just adds the bucket to "staging" I think it's worth verifying all the permissions blocking write/deletes are correct on the VEDA side before deploying more widely.

anayeaye commented 1 month ago

Question should ESDIS and GHG instance get the same bucket access? They all currently only have staging.

All hubs in the VEDA universe should have GetObject and ListBucket perms veda-data-store. It is slowish but we are still trying to encourage sharing rather than duplicating data to every environment. EDIT we also need to add/confirm those instances are covered the bucket policy.

it appears the bucket policy is not correct. Can you please share the policy internally (not on this ticket for review)

I will share it with you internally. I would be surprised if it is not correct because I have granted the same permissions as the hubs currently have for the staging bucket which can be accessed via. the hub.

The rio cogeo info routine in the hub is easier to test than running the notebook example, thanks for the snippet!

## veda-data-store-staging accessible
(notebook) jovyan@jupyter-anayeaye:~$ rio cogeo info s3://veda-data-store-staging/EIS/COG/Fire-Hydro/bs_to_save.tif
Driver: GTiff
File: s3://veda-data-store-staging/EIS/COG/Fire-Hydro/bs_to_save.tif
COG: True
Compression: DEFLATE

## veda-data-store equivalent object is not accessible
(notebook) jovyan@jupyter-anayeaye:~$ rio cogeo info s3://veda-data-store/caldor-fire-burn-severity/bs_to_save.tif
WARNING:rasterio._env:CPLE_AppDefined in HTTP response code on https://veda-data-store.s3.amazonaws.com/caldor-fire-burn-severity/bs_to_save.tif: 403
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 310, in rasterio._base.DatasetBase.__init__
  File "rasterio/_base.pyx", line 221, in rasterio._base.open_dataset
  File "rasterio/_err.pyx", line 221, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_AWSAccessDeniedError: Access Denied
wildintellect commented 1 month ago

@anayeaye yes I spoke to soon, the blocker is actually on the Hub side right now. One you approve 2i2c will deploy to staging hub, we can test, then do a 2nd PR pushing that bucket to all the VEDA related hubs.

wildintellect commented 1 month ago

@anayeaye I've tested on staging that read access works. How would you like to test that other actions are blocked? Do you want try making a file in the bucket - is there a safe object to test removing? etc....

Then when you're happy I can open another PR To apply the fix to all the hubs/production.

(notebook) jovyan@jupyter-wildintellect:~$ rio cogeo info s3://veda-data-store/barc-thomasfire/thomas_fire_barc_201712.cog.tiff
Driver: GTiff
File: s3://veda-data-store/barc-thomasfire/thomas_fire_barc_201712.cog.tiff
COG: True
Compression: DEFLATE
ColorSpace: None
...
anayeaye commented 1 month ago

@wildintellect I'm comfortable with the MCP bucket policy blocking. Would be nice to see things more specific in the hub role but it doesn't need to be. So I say we are ready for the PR to apply the fix to production. Thanks!

wildintellect commented 3 weeks ago

PR completed https://github.com/2i2c-org/infrastructure/pull/4609#issuecomment-2286582446 TODO: verify with quick test.

smohiudd commented 3 weeks ago

I ran a few of the veda-docs quickstart notebooks now and not getting anymore access denied errors.

wildintellect commented 3 weeks ago

If it all looks good please comment on https://github.com/2i2c-org/infrastructure/issues/4535#issuecomment-2286971462 and then we can close this.

anayeaye commented 3 weeks ago

currently having a pydantic v2 version conflict problem in the hub so I used a new test :(.

BUT I can read prod from hub.openveda.cloud ✅

aws s3api head-object --bucket veda-data-store --key caldor-fire-burn-severity/bs_to_save.tif
{
    "AcceptRanges": "bytes",
    "LastModified": "2024-03-15T21:13:17+00:00",
    "ContentLength": 324771,
    "ETag": "\"e3a43004c765f8e69794228258c0c579\"",
    "ContentType": "image/tiff",
    "ServerSideEncryption": "AES256",
wildintellect commented 4 days ago

@anayeaye can we close this ticket?