Closed Thomas-Moore-Creative closed 2 years ago
xarray
docs on multi-dimensional coordinates > https://xarray.pydata.org/en/v0.19.0/examples/multidimensional-coords.html
Code workflow here is rough and interactive. A more automated workflow could be built from:
OOD-S2-regrid-export-workflow.ipynb
OOD-S2-regrid-export-workflow-part2.ipynb
OOD-S2-RA-export-ETBF-ready-files.ipynb
adding in conversion from native grid to "standard, rectilinear 0.25 degree grid" here OOD-S2-RA-standard025regrid-export
adding SSH to tasks above - reason: SSH is available in Forecasts and thus BoM likely has some confidence in it.
S2RA_write_ETBF_zarr
: take raw NC files and generate zarr collections for U, V, and T variables on the native grid. (Challenges = inconsistent date stamps & a mix of finish dates across the variables)S2RA_mask_ETBFcrop_zarr
: take each grid domain object (U, V, T) and (1) mask out land in 3D, (2) crop out desired region, and (3) Exporting to intermediate zarr collections for each grid domain. (Challenges = multi dim coordinates require a where
method to slice regionally - which doesn't like the 180 boundary so coordinates need to be shifted first.)S2RA_regrid_zarr
: tools to regrid to 0.25 and 1.0 rectilinear grids - exporting to one intermediate zarr collection per grid. (Notes on conservative regrinding with "coastline masks" > XESMF docs // More Challenges: U & V have depth dependent, "3D" masks and this likely requires looping over all 75 depths?) (Need cell corners for conservative regridding - will default back to bilinear plus NN extrapolation. This requires a land mask for both input and output which I'm manually generating)S2RA_ETBFcalc
: calculate EKE and the mean and integrated U & V quantities required for ETBF - exporting to one intermediate zarr collection.S2RA_ETBF_doc_test_export
: Add needed metadata, run some tests, and export to needed netcdf file formatIt seems clear that BoM is only using bilinear for their internal regridding
cdo -s -L remapbil,r1440x720 -selname,"temp" -setmisstonn tmp_1.nc tmp_2.nc && mv tmp_{2,1}.nc
And that conservative regridding, regardless of platform, requires all the cell corners.
So we will default to bilinear regridding for now.
The steps BoM uses are the same as my plan here:
ncatted -a coordinates,"temp",c,c,"nav_lon nav_lat" tmp_1.nc
cdo -s -L -sellonlatbox,100,200,-50,10 -selname,"temp" tmp_1.nc tmp_2.nc && mv tmp_{2,1}.nc
cdo -s -L remapbil,r1440x720 -selname,"temp" -setmisstonn tmp_1.nc tmp_2.nc && mv tmp_{2,1}.nc
cdo -s -L -sellonlatbox,110,190,-45,5 -selname,"temp" tmp_1.nc tmp_2.nc && mv tmp_{2,1}.nc
cdo -s -f nc4 -z zip copy tmp_1.nc latest_forecast_rg.nc
update: Grant Smith (BoM) has clarified that -setmisstonn
means set missing values to nearest neighbour extrapolation. We can follow this approach with xESMF
.
Notes:
ssh_corrected
eke300 = eke300.where(eke300 != 0)
S2RA_write_ETBF_zarr
: take raw NC files and generate zarr collections for U, V, and T variables on the native grid after some cleaning and fixes of inconsistent nc files. (Challenges = inconsistent date stamps & a mix of finish dates across the variables)S2RA_mask_ETBFcrop_zarr
: take each grid domain object (U, V, T) and (1) mask out land in 3D, (2) crop out desired region, and (3) Exporting to intermediate zarr collections for each grid domain. (Challenges = multi dim coordinates require a where
method to slice regionally - which doesn't like the 180 boundary so coordinates need to be shifted first.)S2RA_regrid_zarr
: tools to regrid to 0.25 - exporting to one intermediate zarr collection. (More Challenges: U & V have depth dependent, "3D" masks and this is a complication) (using bilinear plus NN extrapolation. This requires a land mask for both input and output which I'm manually generating at the output end using > uses https://github.com/toddkarin/global-land-mask which is based on elevation data here > https://www.ngdc.noaa.gov/mgg/topo/gltiles.html )S2RA_ETBFcalc
: calculate EKE and the mean and integrated U & V quantities required for ETBF - exporting to one intermediate zarr collection.S2RA_ETBF_doc_test_export
: Add needed metadata, run some tests, and export to needed netcdf file formataccessS2.RA.ocean.masked.AUSWCPregion.ETBFvars.zarr
written. ~10GB of data compressed to 6GB.
testing shows issue / possible mistake in the workflow converting ssh
to ssh_corrected
CHECK STEP 1 for error
Unfortunately ssh_corrected
only has 468 and not 492 timesteps and this looks to be causing an issue.
Not clear if NCI datasets are meant to have end dates that are mixed up and different?
Further raw NC
files for ssh_corrected
are chunked differently than other variables?
UPDATE: the problem seems to be that in the ssh_corrected
DIR some NC files use ssh
and some ssh_corrected
for the variable name.
ncdump -h mo_ssh_corrected_1981.nc
ssh:_FillValue = 9.96921e+36f ;
ssh:units = "m" ;
ssh:standard_name = "sea_surface_height_above_geoid" ;
ssh:long_name = "Sea Surface Height" ;
ssh:online_operation = "ave(X)" ;
ssh:interval_operation = 1350.f ;
ssh:interval_write = 86400.f ;
ssh:coordinates = "nav_lat nav_lon" ;
ssh:cell_measures = "area: areat" ;
ssh:cell_methods = "time_counter: mean" ;
:correction = "Correction to the file so that the weighed average SSH is zero - See seasonal prediction little task 827" ;
ncdump -h mo_ssh_corrected_2021.nc
ssh_corrected:_FillValue = 9.96921e+36f ;
ssh_corrected:units = "m" ;
ssh_corrected:standard_name = "sea_surface_height_above_geoid" ;
ssh_corrected:long_name = "Sea Surface Height" ;
ssh_corrected:online_operation = "ave(X)" ;
ssh_corrected:interval_operation = 1350.f ;
ssh_corrected:interval_write = 86400.f ;
ssh_corrected:coordinates = "nav_lat nav_lon" ;
ssh_corrected:cell_measures = "area: areat" ;
ssh_corrected:cell_methods = "time_counter: mean" ;
NaNs
finished step 5, exporting /g/data/v14/tm4888/data/ACCESS-S2/ETBF_export/AUS_region/accessS2.RA.ETBFvars.AUSregion.grid025deg.nc
This can be closed as data delivered.
[x] SST
[x] SSS
[x] mld1
[x] mld2
[x] td
[x] temp50
[x] temp100
[x] temp200
[x] temp500
[x] u100
[x] v100
[x] u100_300
[x] v100_300
[x] EKE_300
[x] EKE_2000
[x] D20
[x] OHC_300
[x] SSH
Individual netcdf files per variable
Two grids: 0.25 & 1deg as per past CAFE data, i.e.: