Thomas-Moore-Creative / NCI-ACCESS-S2-ARD

progress towards analysis ready data (ARD) for the ACCESS-S2 collection at NCI
GNU General Public License v3.0
1 stars 0 forks source link

Generate full set of ETBF variables from ACCESS-S2 RA #4

Closed Thomas-Moore-Creative closed 2 years ago

Thomas-Moore-Creative commented 3 years ago
(longitude: 100) array[130.5, 131.5, 132.5, 133.5, 134.5, 135.5, 136.5, 137.5, 138.5, 139.5,
       140.5, 141.5, 142.5, 143.5, 144.5, 145.5, 146.5, 147.5, 148.5, 149.5,
       150.5, 151.5, 152.5, 153.5, 154.5, 155.5, 156.5, 157.5, 158.5, 159.5,
       160.5, 161.5, 162.5, 163.5, 164.5, 165.5, 166.5, 167.5, 168.5, 169.5,
       170.5, 171.5, 172.5, 173.5, 174.5, 175.5, 176.5, 177.5, 178.5, 179.5,
       180.5, 181.5, 182.5, 183.5, 184.5, 185.5, 186.5, 187.5, 188.5, 189.5,
       190.5, 191.5, 192.5, 193.5, 194.5, 195.5, 196.5, 197.5, 198.5, 199.5,
       200.5, 201.5, 202.5, 203.5, 204.5, 205.5, 206.5, 207.5, 208.5, 209.5,
       210.5, 211.5, 212.5, 213.5, 214.5, 215.5, 216.5, 217.5, 218.5, 219.5,
       220.5, 221.5, 222.5, 223.5, 224.5, 225.5, 226.5, 227.5, 228.5, 229.5]
(latitude: 90)>
array([-69.5, -68.5, -67.5, -66.5, -65.5, -64.5, -63.5, -62.5, -61.5, -60.5,
       -59.5, -58.5, -57.5, -56.5, -55.5, -54.5, -53.5, -52.5, -51.5, -50.5,
       -49.5, -48.5, -47.5, -46.5, -45.5, -44.5, -43.5, -42.5, -41.5, -40.5,
       -39.5, -38.5, -37.5, -36.5, -35.5, -34.5, -33.5, -32.5, -31.5, -30.5,
       -29.5, -28.5, -27.5, -26.5, -25.5, -24.5, -23.5, -22.5, -21.5, -20.5,
       -19.5, -18.5, -17.5, -16.5, -15.5, -14.5, -13.5, -12.5, -11.5, -10.5,
        -9.5,  -8.5,  -7.5,  -6.5,  -5.5,  -4.5,  -3.5,  -2.5,  -1.5,  -0.5,
         0.5,   1.5,   2.5,   3.5,   4.5,   5.5,   6.5,   7.5,   8.5,   9.5,
        10.5,  11.5,  12.5,  13.5,  14.5,  15.5,  16.5,  17.5,  18.5,  19.5])
Thomas-Moore-Creative commented 3 years ago

xarray docs on multi-dimensional coordinates > https://xarray.pydata.org/en/v0.19.0/examples/multidimensional-coords.html

Thomas-Moore-Creative commented 2 years ago

Code workflow here is rough and interactive. A more automated workflow could be built from: OOD-S2-regrid-export-workflow.ipynb OOD-S2-regrid-export-workflow-part2.ipynb OOD-S2-RA-export-ETBF-ready-files.ipynb

Thomas-Moore-Creative commented 2 years ago

adding in conversion from native grid to "standard, rectilinear 0.25 degree grid" here OOD-S2-RA-standard025regrid-export

Thomas-Moore-Creative commented 2 years ago

adding SSH to tasks above - reason: SSH is available in Forecasts and thus BoM likely has some confidence in it.

Thomas-Moore-Creative commented 2 years ago

Thoughts on improved workflow for 2022:

  1. S2RA_write_ETBF_zarr : take raw NC files and generate zarr collections for U, V, and T variables on the native grid. (Challenges = inconsistent date stamps & a mix of finish dates across the variables)
  2. S2RA_mask_ETBFcrop_zarr : take each grid domain object (U, V, T) and (1) mask out land in 3D, (2) crop out desired region, and (3) Exporting to intermediate zarr collections for each grid domain. (Challenges = multi dim coordinates require a where method to slice regionally - which doesn't like the 180 boundary so coordinates need to be shifted first.)
  3. S2RA_regrid_zarr : tools to regrid to 0.25 and 1.0 rectilinear grids - exporting to one intermediate zarr collection per grid. (Notes on conservative regrinding with "coastline masks" > XESMF docs // More Challenges: U & V have depth dependent, "3D" masks and this likely requires looping over all 75 depths?) (Need cell corners for conservative regridding - will default back to bilinear plus NN extrapolation. This requires a land mask for both input and output which I'm manually generating)
  4. S2RA_ETBFcalc : calculate EKE and the mean and integrated U & V quantities required for ETBF - exporting to one intermediate zarr collection.
  5. S2RA_ETBF_doc_test_export : Add needed metadata, run some tests, and export to needed netcdf file format
Thomas-Moore-Creative commented 2 years ago

It seems clear that BoM is only using bilinear for their internal regridding cdo -s -L remapbil,r1440x720 -selname,"temp" -setmisstonn tmp_1.nc tmp_2.nc && mv tmp_{2,1}.nc And that conservative regridding, regardless of platform, requires all the cell corners. So we will default to bilinear regridding for now.

Thomas-Moore-Creative commented 2 years ago

The steps BoM uses are the same as my plan here:

  1. crop with padding
  2. bilinear regridding
  3. crop off padding
ncatted -a coordinates,"temp",c,c,"nav_lon nav_lat" tmp_1.nc

cdo -s -L -sellonlatbox,100,200,-50,10 -selname,"temp" tmp_1.nc tmp_2.nc  && mv tmp_{2,1}.nc
cdo -s -L remapbil,r1440x720 -selname,"temp" -setmisstonn tmp_1.nc tmp_2.nc   && mv tmp_{2,1}.nc
cdo -s -L -sellonlatbox,110,190,-45,5 -selname,"temp" tmp_1.nc tmp_2.nc  && mv tmp_{2,1}.nc
cdo -s -f nc4 -z zip copy tmp_1.nc latest_forecast_rg.nc

update: Grant Smith (BoM) has clarified that -setmisstonn means set missing values to nearest neighbour extrapolation. We can follow this approach with xESMF.

Thomas-Moore-Creative commented 2 years ago

Notes:

  1. Check to be sure SSH is ssh_corrected
  2. EKE sums : skipna = True means we need to add mask back in eke300 = eke300.where(eke300 != 0)
Thomas-Moore-Creative commented 2 years ago

Improved workflow

  1. S2RA_write_ETBF_zarr : take raw NC files and generate zarr collections for U, V, and T variables on the native grid after some cleaning and fixes of inconsistent nc files. (Challenges = inconsistent date stamps & a mix of finish dates across the variables)
  2. S2RA_mask_ETBFcrop_zarr : take each grid domain object (U, V, T) and (1) mask out land in 3D, (2) crop out desired region, and (3) Exporting to intermediate zarr collections for each grid domain. (Challenges = multi dim coordinates require a where method to slice regionally - which doesn't like the 180 boundary so coordinates need to be shifted first.)
  3. S2RA_regrid_zarr : tools to regrid to 0.25 - exporting to one intermediate zarr collection. (More Challenges: U & V have depth dependent, "3D" masks and this is a complication) (using bilinear plus NN extrapolation. This requires a land mask for both input and output which I'm manually generating at the output end using > uses https://github.com/toddkarin/global-land-mask which is based on elevation data here > https://www.ngdc.noaa.gov/mgg/topo/gltiles.html )
  4. S2RA_ETBFcalc : calculate EKE and the mean and integrated U & V quantities required for ETBF - exporting to one intermediate zarr collection.
  5. S2RA_ETBF_doc_test_export : Add needed metadata, run some tests, and export to needed netcdf file format
Thomas-Moore-Creative commented 2 years ago

Progress complete through and including step #4

accessS2.RA.ocean.masked.AUSWCPregion.ETBFvars.zarr written. ~10GB of data compressed to 6GB.

Thomas-Moore-Creative commented 2 years ago

testing shows issue / possible mistake in the workflow converting ssh to ssh_corrected CHECK STEP 1 for error

Unfortunately ssh_corrected only has 468 and not 492 timesteps and this looks to be causing an issue. Not clear if NCI datasets are meant to have end dates that are mixed up and different?

Further raw NC files for ssh_corrected are chunked differently than other variables?

UPDATE: the problem seems to be that in the ssh_corrected DIR some NC files use ssh and some ssh_corrected for the variable name.

Thomas-Moore-Creative commented 2 years ago

SSH corrected NC files have inconsistent variable names

ncdump -h mo_ssh_corrected_1981.nc

        ssh:_FillValue = 9.96921e+36f ;
        ssh:units = "m" ;
        ssh:standard_name = "sea_surface_height_above_geoid" ;
        ssh:long_name = "Sea Surface Height" ;
        ssh:online_operation = "ave(X)" ;
        ssh:interval_operation = 1350.f ;
        ssh:interval_write = 86400.f ;
        ssh:coordinates = "nav_lat nav_lon" ;
        ssh:cell_measures = "area: areat" ;
        ssh:cell_methods = "time_counter: mean" ;

:correction = "Correction to the file so that the weighed average SSH is zero - See seasonal prediction little task 827" ;

ncdump -h mo_ssh_corrected_2021.nc

        ssh_corrected:_FillValue = 9.96921e+36f ;
        ssh_corrected:units = "m" ;
        ssh_corrected:standard_name = "sea_surface_height_above_geoid" ;
        ssh_corrected:long_name = "Sea Surface Height" ;
        ssh_corrected:online_operation = "ave(X)" ;
        ssh_corrected:interval_operation = 1350.f ;
        ssh_corrected:interval_write = 86400.f ;
        ssh_corrected:coordinates = "nav_lat nav_lon" ;
        ssh_corrected:cell_measures = "area: areat" ;
        ssh_corrected:cell_methods = "time_counter: mean" ;

FIXED in step 1 with a custom preprocessing step:

Thomas-Moore-Creative commented 2 years ago

POST-SSHname fix progress

Thomas-Moore-Creative commented 2 years ago

started testing in step 5

Thomas-Moore-Creative commented 2 years ago

finished step 5, exporting /g/data/v14/tm4888/data/ACCESS-S2/ETBF_export/AUS_region/accessS2.RA.ETBFvars.AUSregion.grid025deg.nc

Thomas-Moore-Creative commented 2 years ago

This can be closed as data delivered.