Closed anayeaye closed 6 months ago
One collection for each SSP
climdex-tmaxxf-access-cm2-ssp126
climdex-tmaxxf-access-cm2-ssp245
climdex-tmaxxf-access-cm2-ssp370
climdex-tmaxxf-access-cm2-ssp585
tmax_above_90
(later we may add other thresholds as assets but not for Nov 17)s3://veda-data-store-staging/climdex-tmaxxf-access-cm2-ssp126/tmaxXF-ACCESS-CM2-ssp126_tmax_above_90_<year>.tif
(_compressed.nc
is replaced with _tmax_above_90_<year>.tif
)@SwordSaintLancelot I had a look at the first outputs in s3://climatedashboard-data/climdex/tmaxXF/ACCESS-CM2/
and they look good. I have a couple requests for the files before we publish the objects in veda-data-store-staging
DEFLATE
instead of LZW
compression (as in: da.rio.to_raster("<outname>.tif", driver="COG", compress=compress)
)tmaxXF-ACCESS-CM2-ssp126_tmax_above_90_2015.tif
, use tmaxXF-ACCESS-CM2-ssp126_2015_tmax_above_90.tif
. As in put the year before the netcdf variable name <netcdf-basename>_<YYYY>_<VARIABLE_NAME>.tif
. _I think this will make it easier to generate multi asset STAC items: for the 86 years in the source file with basename tmaxXF-ACCESS-CM2-ssp126_compressed.nc
we will want to generate a STAC items with ids 'tmaxXF-ACCESS-CM2-ssp126_<YYYY>
_After those adjustments I think we are good to publish the objects for the 4 collections to veda-data-store-staging
as s3://veda-data-store-staging/<collection-id>/<filename.tif>
. For this pilot work I think we should just use a simple collection-id/files path instead of copying the complex storage structure that was in the original request (for the sake of making airflow ingests easy--does that sound right @ividito?). As in:
s3://veda-data-store-staging/climdex-tmaxxf-access-cm2-ssp126/
tmaxXF-ACCESS-CM2-ssp126_tmax_above_90_2015.tif
tmaxXF-ACCESS-CM2-ssp126_tmax_above_90_2016.tif
import s3fs
import xarray as xr
# Open NetCDF with s3fs and read to xarray using h5netcdf engine
fs = s3fs.S3FileSystem()
VARIABLE_NAME = "tmax_above_90"
aws_url = "s3://cmip6-staging/climdex/tmaxXF/ACCESS-CM2/tmaxXF-ACCESS-CM2-ssp126_compressed.nc"
fileObj = fs.open(aws_url)
ds = xr.open_dataset(fileObj, engine="h5netcdf")
da= ds[VARIABLE_NAME].isel(time=0)
# Add crs and set spatial dims if needed
if not da.rio.crs:
da.rio.write_crs("epsg:4326", inplace=True)
# Flip and set spatial dimensions
da = da.reindex(lat=list(reversed(da.lat)))
da.rio.set_spatial_dims("lon", "lat")
# Cloud optimize and generate raster
driver = "COG"
compress = "DEFLATE"
da.rio.to_raster("test_compressed.tif", driver=driver, compress=compress)
The four collections have been published to staging stac catalog.
Each item has 5 assets for above 86, above 90, above 100, above 110, and above 150.
tmax_above_86
tmax_above_90
tmax_above_100
tmax_above_110
tmax_above_115
This is complete, right? 🎉
PR for the collection configs - https://github.com/NASA-IMPACT/veda-data/pull/97 Should now be complete!
What
The annual number of days with a maximum temperature greater than 90F has been selected as the pilot Climdex Nex-GDDP dataset for VEDA. This metric is one of the 5 thresholds included in tmaxXF netCDFs. We have a version of this index for each of the 35 NEX-GDDP CMIP6 models with multiple SSPs each. This pilot is to transform and ingest tmaxXF for a single model, not all 35 yet.
Details
s3://cmip6-staging/climdex/tmaxXF/ACCESS-CM2/*.nc
s3://veda-data-store-staging/climdex/tmaxXF/ACCESS-CM2/*.tif
(if we do ingest all 35 models we will want this key structure to compare model usage and for browsability)~ EDIT see updateACCESS-CM2
Transformation notes
STAC notes
AC