NASA-IMPACT / veda-data

2 stars 0 forks source link

Mirror staged landsat demo collections in prod #138

Closed anayeaye closed 3 weeks ago

anayeaye commented 1 month ago

What

This PR adds the collection metadata for publishing the hand-curated landsat sample collections as well as a notebook for selecting, correcting, and publishing both collection and item metadata to the production account.

How tested

I ran the notebook included in this PR to

  1. correct and publish the collection metadata to the test catalog
  2. iterate over the items in the source catalog and fix metadata where possible and then publish only the valid items to the test catalog
  3. repeated for production and verified that the item count in the production catalog is not significantly smaller than the staging catalog (but it is expected to be lower because the staging catalog had invalid records with hrefs to files that do not exist in s3)

    Caveats

  4. Many of the items in these collections had invalid classification metadata so I removed classification from the declared stac_extensions
  5. Many items had hrefs to non-existent files, at the point when I hit this validation error I decided to stop incrementally fixing the item metadata and just publish the valid stac records to production. So the production counts in the audit are lower than staging counts but this is because invalid items were published to the staging catalog. It is unlikely that anyone will miss records with links to tifs that do not exist.

Formerly Blocked

No longer blocked but I am leaving these notes here in case we see a similar validation error in the future

Currently items cannot be published with an Asset not accessible: Forbidden error

"[{'loc': ('body', 'assets', 'red', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'blue', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'green', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'nir08', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'swir16', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'swir22', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'ANG.txt', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'MTL.txt', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'MTL.xml', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'coastal', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'MTL.json', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'qa_pixel', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'qa_radsat', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'thumbnail', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'qa_aerosol', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}, {'loc': ('body', 'assets', 'reduced_resolution_browse', 'href'), 'msg': 'Asset not accessible: Forbidden', 'type': 'value_error'}]"

However the role that is used by the titiler and that should be used by the ingest api is able to access these assets, for example for s3://usgs-landsat/collection02/level-2/standard/oli-tirs/2023/001/113/LC08_L2SR_001113_20230125_20230208_02_T2/LC08_L2SR_001113_20230125_20230208_02_T2_SR_B4.TIF

curl -X 'GET' \
  'https://test.openveda.cloud/api/raster/cog/info?url=s3://usgs-landsat/collection02/level-2/standard/oli-tirs/2023/001/113/LC08_L2SR_001113_20230125_20230208_02_T2/LC08_L2SR_001113_20230125_20230208_02_T2_SR_B4.TIF' \
  -H 'accept: application/json' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   369  100   369    0     0    405      0 --:--:-- --:--:-- --:--:--   405
{
  "bounds": [
    -106.8322594139634,
    -76.02196986271962,
    -96.60693094236582,
    -73.34367090509093
  ],
  "minzoom": 4,
  "maxzoom": 10,
  "band_metadata": [
    [
      "b1",
      {}
    ]
  ],
  "band_descriptions": [
    [
      "b1",
      ""
    ]
  ],
  "dtype": "uint16",
  "nodata_type": "Nodata",
  "colorinterp": [
    "gray"
  ],
  "scales": [
    1.0
  ],
  "offsets": [
    0.0
  ],
  "driver": "GTiff",
  "count": 1,
  "width": 8381,
  "height": 8441,
  "overviews": [
    2,
    4,
    8,
    16,
    32,
    64
  ],
  "nodata_value": 0.0
}
anayeaye commented 3 weeks ago

Unblocked by adding requester pays configuration to the ingest-api https://github.com/NASA-IMPACT/veda-backend/pull/388