Closed smohiudd closed 3 months ago
Merged a PR to fix the transfer DAG: https://github.com/NASA-IMPACT/veda-data-airflow/pull/121
Tested the following transfer in dev
MWAA:
{
"origin_bucket": "veda-data-store-staging",
"origin_prefix": "geoglam/",
"filename_regex": "^(.*).tif$",
"target_bucket": "veda-data-store",
"collection": "geoglam",
"cogify": "false",
"dry_run": "false"
}
I didn't get any errors in airflow. @anayeaye or @botanical when you get a chance can you check if this worked in MCP?
Doing some testing and the airflow DAG can't work without appropriate PUT permission to veda-data-store
. I know that vedaDataAccessRole has PUT permissions to:
"Resource": [
"arn:aws:s3:::veda-data-store-staging",
"arn:aws:s3:::veda-data-store-staging/*"
]
},
But do we know if there's a similar policy in MCP for veda-data-store
?
@smohiudd how would I check the dev MWAA transfer in MCP?
I see a role in MCP called veda-data-store-access
that has:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BucketPermissions",
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::veda-data-store"
]
},
{
"Sid": "ObjectPermissions",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:PutObjectVersionTagging",
"s3:PutObjectTagging"
],
"Resource": [
"arn:aws:s3:::veda-data-store/*"
]
}
]
}
and another role veda-data-store-access-staging
that has
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BucketPermissions",
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::veda-data-store-staging"
]
},
{
"Sid": "ObjectPermissions",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:PutObjectVersionTagging",
"s3:PutObjectTagging",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::veda-data-store-staging/*"
]
}
]
}
@botanical the transfer I ran last night didn't work (you can check by seeing if there are files in the bucket). The DAG failed without error - handler needs some re work.
I ran another test today locally using a fixed handler and it did work for s3://veda-data-store/geoglam/
@anayeaye created a new role for us to use in the airflow transfer handler that should allow PUT operations to the veda-data-store bucket
. New role is arn:aws:iam::114506680961:role/veda-data-manager
I see 45 objects in veda-data-store/geoglam/ in MCP which were created around March 13, 2024, 10:34:43 (UTC-07:00)
@smohiudd
Another PR to fix the transfer util: https://github.com/NASA-IMPACT/veda-data-airflow/pull/122
The transfer DAG is working in dev
airflow and is ready to start moving assets. To do this programatically, the next step could be to create a script or notebook and runs the transfer DAG on each collection. The configs would be similar to the discovery items configs with a couple slight modifications.
I ran a transfer on Friday and it went OK. There are a few collections I need to rerun but I would say we're most of the way there.
These collections failed because of incorrect errors or config files and need to be run again:
Also, below are special case collections which weren't part of the batch and will require manual transfers.
These datasets will be transferred at a later time:
What
To support a production instance, STAC assets that are currently in
veda-data-store-staging
must be copied toveda-data-store-production
.DAG in airflow to copies assets - confirm if its operational in
dev
orstaging
PI Objective
Objective 4: Publish production data
Acceptance Criteria
veda-data-store-staging
are available inveda-data-store