asascience-open / xarray-subset-grid

Subset Xarray datasets in space
BSD 3-Clause "New" or "Revised" License
5 stars 2 forks source link

Add SCHISM compataibility #8

Open mpiannucci opened 1 month ago

mpiannucci commented 1 month ago

SCHISM datasets should be compatible with the UGRID accessor, but to ensure this is so we need to test, and fix any issues that arise. This issue also covers adding example notebooks and test cases for SCHISM datasets

AtiehAlipour-NOAA commented 1 month ago

Description:

This issue serves as a general ticket for the development of a subsetting tool for STOFS 3D model outputs. The aim is to add functionality to xarray-subset-grid code for efficiently subsetting data from STOFS 3D outputs while preserving key features of the model.

This issue serves as a starting point for the development of the subsetting tool. Subsequent issues may be created for smaller tasks related to its development.

For STOFS-3D, we use SCHISM as the base model.

Preservation of Mesh Structure: SCHISM use both triangular and quadratic meshes. It's essential that xarray-subset-grid maintains the structural of the mesh after subsetting.

Subset Data for Different Variables at Different Vertical Layers: STOFS 3D outputs contain data across multiple vertical layers for different variables. The xarray-subset-grid should incorporate functionalities to subset data based on different layers and variable names.

mpiannucci commented 1 week ago

Here is the target file: s3://noaa-nos-stofs3d-pds/STOFS-3D-Atl/stofs_3d_atl.20240619/stofs_3d_atl.t12z.fields.out2d_f037_048.nc

omkar-334 commented 4 days ago

From Discussions

'stofs_3d_atl.t12z.fields.horizontalVelX_f001_012.nc'
'stofs_3d_atl.t12z.fields.horizontalVelY_f001_012.nc'
'stofs_3d_atl.t12z.fields.salinity_f001_012.nc'
'stofs_3d_atl.t12z.fields.temperature_f001_012.nc'
'stofs_3d_atl.t12z.fields.zCoordinates_f001_012.nc'
'stofs_3d_atl.t12z.fields.out2d_f001_012.nc'
'schout_adcirc_20240619.nc'
As expected, only the last 2 files, out2d and schout are detected as SCHISM and ADCIRC respectively.

According to this solution that Atieh had sent, I tried on out2d and schout.
The problem is that when we rename the variables, the CF metadata remains the same.
image

In our code, ugrid.py, We obtain the x and y variable names using the ds.cf['mesh_topology']['node_coordinates'] Here the code breaks since it checks for the original variable names after we've renamed.
One solution for this is to rewrite the CF metadata. I've tried this using schoutds.cf['mesh_topology'].assign_attrs(node_coordinates ='x y') but this seems to return a new object and the CF accessor doesn't support item assignment.
out.cf["mesh_topology"]['node_coordinates'] = 'x y' doesn't resolve it either.

Another solution would be not rename the variables.
After dropping and adding the nele dimension,

ds2 = ds2.drop_dims('nele')  
ds2['nele'] = ds['nele']  

This variable disappears - SCHISM_hgrid_face_nodes - (nSCHISM_hgrid_face, nMaxSCHISM_hgrid_face_nodes) Here, nSCHISM_hgrid_face is renamed to nele. It could be that the new nele does not retain the connectivity and this happens.

omkar-334 commented 4 days ago

From Discussion

One solution is to first read files with complete information, and then read files with missing information like temperature for subsetting. We can write a wrapper for this.

'stofs_3d_atl.t12z.fields.horizontalVelX_f001_012.nc'
'stofs_3d_atl.t12z.fields.horizontalVelY_f001_012.nc'
'stofs_3d_atl.t12z.fields.salinity_f001_012.nc'
'stofs_3d_atl.t12z.fields.temperature_f001_012.nc'
'stofs_3d_atl.t12z.fields.zCoordinates_f001_012.nc'

If a file from above is chosen for subsetting, we can grab the respective schout and out2d, given that they exist for every file. However, files like temperature and salinity are about 7GB. SHould we add this info only if it is needed, or do we chunk it and add to all subsetting jobs?

ChrisBarker-NOAA commented 4 days ago

hmm.

I think that xarray should "lazy load" the data, so the 7GB would only be loading up when saving out the results.

And we want a way for the user to be able to specify what variables they want.

So it should be fine to include everything up front.

ChrisBarker-NOAA commented 4 days ago

One solution for this is to rewrite the CF metadata. I've tried this using > schoutds.cf['mesh_topology'].assign_attrs(node_coordinates ='x y') but this seems to return a new object > and the CF accessor doesn't support item assignment. out.cf["mesh_topology"]['node_coordinates'] = 'x y' doesn't resolve it either.

I think what you need to do is use .cf['mesh_topology'] to get the xarray variable, and then you can change the attrs directly on that variable.

Another option would be to use:

ds.subset_grid.assign_ugrid_topology() -- it will overwrite the mesh variable attribute for anything you pass in. (only partially tested, but it should work)

Another solution would be not rename the variables.

is there a reason to rename them? if not, then yes, that's the easier way to go.

ChrisBarker-NOAA commented 4 days ago

One more thought: I think it's a goal for the resulting subset file to be as similar to the original as possible -- e.g. all variables have the same names. So I don't think any renaming should be done, unless absolutely required.

[while the CF standard applies almost no meaning to variable names, a lot of users do -- code tends to simply look for known variable names]

mpiannucci commented 4 days ago

Yes we very much do not want to rename any variables as a part of this process