NASA-IMPACT / veda-data

2 stars 0 forks source link

Transform annual tmaxXF to COG and publish STAC metadata for a single NEX-GDDP model #90

Closed anayeaye closed 6 months ago

anayeaye commented 8 months ago

What

The annual number of days with a maximum temperature greater than 90F has been selected as the pilot Climdex Nex-GDDP dataset for VEDA. This metric is one of the 5 thresholds included in tmaxXF netCDFs. We have a version of this index for each of the 35 NEX-GDDP CMIP6 models with multiple SSPs each. This pilot is to transform and ingest tmaxXF for a single model, not all 35 yet.

Details

Transformation notes

STAC notes

AC

anayeaye commented 8 months ago

UPDATE: Streamlined Plan

4 collections

One collection for each SSP

Items within these collections:

Ingest plan

  1. Publish transformed COGs to `veda-data-store-staging//
  2. Publish the 4 collections
  3. If we set things up this way we should be able to use airflow pipelines to generate item and insert metadata, confirm that we can use start/end datetime as expected and any other common properties we need
anayeaye commented 7 months ago

@SwordSaintLancelot I had a look at the first outputs in s3://climatedashboard-data/climdex/tmaxXF/ACCESS-CM2/ and they look good. I have a couple requests for the files before we publish the objects in veda-data-store-staging

Suggested changes

  1. Use DEFLATE instead of LZW compression (as in: da.rio.to_raster("<outname>.tif", driver="COG", compress=compress))
  2. Filename adjustment, new pattern instead of tmaxXF-ACCESS-CM2-ssp126_tmax_above_90_2015.tif, use tmaxXF-ACCESS-CM2-ssp126_2015_tmax_above_90.tif. As in put the year before the netcdf variable name <netcdf-basename>_<YYYY>_<VARIABLE_NAME>.tif. _I think this will make it easier to generate multi asset STAC items: for the 86 years in the source file with basename tmaxXF-ACCESS-CM2-ssp126_compressed.nc we will want to generate a STAC items with ids 'tmaxXF-ACCESS-CM2-ssp126_<YYYY>_

Object publication

After those adjustments I think we are good to publish the objects for the 4 collections to veda-data-store-staging as s3://veda-data-store-staging/<collection-id>/<filename.tif>. For this pilot work I think we should just use a simple collection-id/files path instead of copying the complex storage structure that was in the original request (for the sake of making airflow ingests easy--does that sound right @ividito?). As in:

s3://veda-data-store-staging/climdex-tmaxxf-access-cm2-ssp126/
     tmaxXF-ACCESS-CM2-ssp126_tmax_above_90_2015.tif
     tmaxXF-ACCESS-CM2-ssp126_tmax_above_90_2016.tif

Sample nc2cog transformation code

import s3fs 
import xarray as xr

# Open NetCDF with s3fs and read to xarray using h5netcdf engine
fs = s3fs.S3FileSystem()

VARIABLE_NAME = "tmax_above_90"
aws_url = "s3://cmip6-staging/climdex/tmaxXF/ACCESS-CM2/tmaxXF-ACCESS-CM2-ssp126_compressed.nc"

fileObj = fs.open(aws_url)
ds = xr.open_dataset(fileObj, engine="h5netcdf")
da= ds[VARIABLE_NAME].isel(time=0)

# Add crs and set spatial dims if needed
if not da.rio.crs:
    da.rio.write_crs("epsg:4326", inplace=True)

# Flip and set spatial dimensions
da = da.reindex(lat=list(reversed(da.lat)))
da.rio.set_spatial_dims("lon", "lat")

# Cloud optimize and generate raster
driver = "COG"
compress = "DEFLATE"
da.rio.to_raster("test_compressed.tif", driver=driver, compress=compress)
slesaad commented 7 months ago

The four collections have been published to staging stac catalog.

Each item has 5 assets for above 86, above 90, above 100, above 110, and above 150.

anayeaye commented 7 months ago

Config notes (wip)

anayeaye commented 7 months ago

~https://github.com/NASA-IMPACT/veda-config-eic/pull/21~ https://github.com/NASA-IMPACT/veda-config-eic/pull/32

j08lue commented 7 months ago

This is complete, right? 🎉

slesaad commented 6 months ago

PR for the collection configs - https://github.com/NASA-IMPACT/veda-data/pull/97 Should now be complete!