ec-jrc / Thalassa

Large scale visualizations of unstructured mesh data
https://thalassa.readthedocs.io/en/latest/?badge=latest
European Union Public License 1.2
20 stars 15 forks source link

normalisation for GENERIC formats -- naming conventions #90

Closed tomsail closed 3 months ago

tomsail commented 4 months ago

This issue is more to open a discussion on naming conventions.

My Dataset has the following format:

Dimensions: time: 7272 node: 301572 face: 588314 max_no_vertices: 3
Coordinates: 
  longitude (node) float32 ...
  latitude (node) float32 ...
  time (time) datetime64[ns] 2023-01-01 ... 2023-10-30T23:00:00
Data variables:
  u (time, node) float32 ...
  v (time, node) float32 ...
  elev (time, node) float32 ...
  face_nodes (face, max_no_vertices) int32 ...

When I try to plot:

plot_map = thalassa.plot(
    ds=ds.isel(time=-1),
    variable="max_elev",
    clim_min=0,
    clim_max=1,
    clabel="meters",
    cmap = 'jet'
)

I get the following error:

KeyError                                  Traceback (most recent call last)
Cell In[22], [line 1](vscode-notebook-cell:?execution_count=22&line=1)
----> [1](vscode-notebook-cell:?execution_count=22&line=1) plot_map = thalassa.plot(
      [2](vscode-notebook-cell:?execution_count=22&line=2)     ds=ds.isel(time=-1),
      [3](vscode-notebook-cell:?execution_count=22&line=3)     variable="max_elev",
      [4](vscode-notebook-cell:?execution_count=22&line=4)     clim_min=0,
      [5](vscode-notebook-cell:?execution_count=22&line=5)     clim_max=1,
      [6](vscode-notebook-cell:?execution_count=22&line=6)     clabel="meters",
      [7](vscode-notebook-cell:?execution_count=22&line=7)     cmap = 'jet'
      [8](vscode-notebook-cell:?execution_count=22&line=8) )

File ~/work/gist/Validation_Protocol/.venv/lib/python3.11/site-packages/thalassa/plotting.py:153, in plot(ds, variable, bbox, title, cmap, colorbar, clabel, clim_min, clim_max, x_range, y_range, show_mesh)
     [93].venv/lib/python3.11/site-packages/thalassa/plotting.py:93) """
     [94].venv/lib/python3.11/site-packages/thalassa/plotting.py:94) Return the plot of the specified `variable`.
     [95].venv/lib/python3.11/site-packages/thalassa/plotting.py:95) 
   (...)
    [149].venv/lib/python3.11/site-packages/thalassa/plotting.py:149) 
    [150].venv/lib/python3.11/site-packages/thalassa/plotting.py:150) """
    [151].venv/lib/python3.11/site-packages/thalassa/plotting.py:151) import holoviews as hv
--> [153].venv/lib/python3.11/site-packages/thalassa/plotting.py:153) ds = normalization.normalize(ds)
    [154].venv/lib/python3.11/site-packages/thalassa/plotting.py:154) _sanity_check(ds=ds, variable=variable)
    [155].venv/lib/python3.11/site-packages/thalassa/plotting.py:155) if bbox:

File ~/work/gist/Validation_Protocol/.venv/lib/python3.11/site-packages/thalassa/normalization.py:203, in normalize(ds)
    [201].venv/lib/python3.11/site-packages/thalassa/normalization.py:201) logger.debug("Dataset normalization: Started")
    [202].venv/lib/python3.11/site-packages/thalassa/normalization.py:202) fmt = infer_format(ds)
--> [203].venv/lib/python3.11/site-packages/thalassa/normalization.py:203) normalizer_func = NORMALIZE_DISPATCHER[fmt]
    [204].venv/lib/python3.11/site-packages/thalassa/normalization.py:204) normalized_ds = normalizer_func(ds)
    [205].venv/lib/python3.11/site-packages/thalassa/normalization.py:205) # Handle quad elements
    [206].venv/lib/python3.11/site-packages/thalassa/normalization.py:206) # Splitting quad elements to triangles, means that the number of faces increases
    [207].venv/lib/python3.11/site-packages/thalassa/normalization.py:207) # There are two options:
   (...)
    [210].venv/lib/python3.11/site-packages/thalassa/normalization.py:210) # I'd rather avoid altering the values of the provided netcdf file therefore we go for option #2,
    [211].venv/lib/python3.11/site-packages/thalassa/normalization.py:211) # i.e. we create the `triface_nodes` variable.

KeyError: <THALASSA_FORMATS.UNKNOWN: 'UNKNOWN'>

That is caused by:

ds = thalassa.normalization.normalize(ds)

I found the solution by making my dataset compliant with the GENERIC format:

ds = ds.rename({
            "longitude": "lon",
            "latitude": "lat",
            "face_nodes": "triface_nodes",
            "max_no_vertices": "three", 
            "face": "triface",
            }) # make variable compliant with thalassa

I also understood the convention face_nodes / triface_nodes is before / after the Quad elements test.

concerning the connectivity of the mesh:

Would it be possible to skip the quad element test if you know already that the mesh is an unstructured triangle grid? Also, I think ideally the connectivity of mesh should be renamed to mesh_topology, according to UGRID and cf_xarray conventions

concerning the coordinates

Would it acceptable to consider other names for coordinates ? like Lon, x or longitude, or should this aspect be dealt by the user?

pmav99 commented 4 months ago

Is this output from Telemac? If yes then the easiest way to add support would be to add an extra "schema" in normalization.py. I think that this entails defining:

If you can create a small file, then add a test too: https://github.com/ec-jrc/Thalassa/blob/b9d977cd6999e73f5ad884e9e7b96d4041b60827/tests/normalization_test.py

The reason we have triface_nodes etc is because:

  1. when it comes to unstructured meshes, Holoviews only supports triangles
  2. we want to be able to support STOFS-3D output too, which uses quads.

So the idea is that we keep the original mesh connectivity as is, and we create a new variable that holds the triangle-only connectivity. If there are no quads, then the new variable is just a copy of the original. The thalassa code internally uses the new variable everywhere.

WRT CF naming-conventions etc, we can discuss it at https://github.com/ec-jrc/Thalassa/issues/80

pmav99 commented 4 months ago

Just to make it clear, we could add support for UGRID compliant netcdfs too. I am for sure not against that, but it is probably going to be more work compared to just updating the current setup. But feel free to investigate it if you want. It is for sure going to be useful in the context of seamesh etc.

tomsail commented 4 months ago

Is this output from Telemac?

From TELEMAC yes, but through pyPoseidon where I convert from Selafin file to xarray Dataset. All details from one format to other is in the norm.py function of the telemac branch

the easiest way to add support would be to add an extra "schema" in normalization.py.

So there are 2 ways to solve this:

  1. use xarray-selafin in thalassa to read directly from Selafins. But it might be too much abstraction for the user and also if xarray-selafin changes the backend, we'd have to adapt the TELEMAC bindings here too.
  2. adapt the conversion convention in pyPoseidon

    A third option would be a combination of both. The quickest win is clearly option 2 though.

pmav99 commented 4 months ago

The idea is to try to keep thalassa decoupled from pyposeidon.

If you add the functionality in pyposeidon, then TELEMAC output will be visualizable by thalassa iff you post-process it with pyposeidon. If you add the functionality in thalassa, then TELEMAC output will be visualizable by thalassa regardless of the way you run/post-processed the results.

That being said, if you truly want to make the TELEMAC output compatible with pyPoseidon, you will probably need to rename the TELEMAC variables/dims anyhow in order to make them compatible with the existing pyposeidon functionality (which uses SCHISM names, pretty much ubiquitusly).

pmav99 commented 3 months ago

I think that this can be closed. Am I wrong?