Open koldunovn opened 2 years ago
hey @koldunovn! this is great! So the only critical thing for psyplot is to have it UGRID-compliant (although I still did not yet had the time to make this in the production release, but I am happy to prioritize this if you are interested in this). I had a look into the netCDF files and from my perspective, it's just about adding some metadata attributes. How can I suggest the changes to you? shall I make a separate issue?
@koldunovn thx for the opportunity to take part in the discussion. I have a bit mixed feelings regarding UGRID for normal output, because it adds complexity where it is not really needed:
I consider model output as something you can use for
I regard outside the model, where you don't have geometric coefficients to compute something like a gradient or a rotation. These things should be handled inside the model, because it (a) has all the information and (b) can make sure things are computed in the exact same way they are simulated. Sorry if this goes into the wrong direction, but I was asked too often for a gradient on a ICON grid computed in CDO ... For the above points I believe the CF-convention is enough. At least I have positive experience with respect to ICON using CF-conform output. In very high resolution the coordinates are usually not written to every output file, but I think this is a reasonable approach to save some disk space and tools should account for this when they want to support high resolution. In the example netcdf output you provided in the tar file, there is not much to be added for being CF-compatible:
What UGRID can offer in addition to this (again 'basic output') is a naming convention to identify geometric objects: points/vertex = nodes, cells = faces, ... so that users have a clue where the temperature points are defined.
BUT
This only scratches the surface of what UGRID is about. From what I understand UGRID is meant to be a representation of the geometric objects and their relations, i.e. connectivity of your grids. Although is very helpful IMO to write a grid file in UGRID in a way where ppl dont need massive knowledge about the model to understand its contents, it does not seem very useful to add this information to the normal model output.
UGRID certainly has it's limitations when it comes to what you do with the grids inside the model: grid != discretization
.
So if it comes to UGRID vs. CF:
Sorry for the wall of text. IMO UGRID is a big step and before that I would consider the possible benefits, like the tools that you could use having UGRID. Are there any? From my experience even tools for CF-compatible files on unstructured grids are very very limited.
hey @Try2Code! one issue is that you can derive ICON-like CF-Conventions from UGRID, but the other way around is quite complicated, isn't it? And the current FESOM output is already pretty much UGRID conform, there are just a few metadata attributes missing, and the connectivity variables need to be transposed.
But can't you actually do both? You can have a mesh
attribute for the UGRID conventions, and a coordinates
attribute with the corresponding bounds
for the CF-Conventions. then people can choose whatever the need.
hey @Try2Code! one issue is that you can derive ICON-like CF-Conventions from UGRID, but the other way around is quite complicated, isn't it?
CF does not distinguish between face and vertex - so it's additional information
But can't you actually do both? You can have a
mesh
attribute for the UGRID conventions, and acoordinates
attribute with the correspondingbounds
for the CF-Conventions. then people can choose whatever the need.
I think thats possible, but why having UGRID without giving the connectivity in the normal output? in this case CF is enough IMO
but why having UGRID without giving the connectivity in the normal output?
you definitely need to give the connectivity to be UGRID compliant. But this is part of fesom.mesh.diag.nc
. As far as I understand, the elements
in this file is the face_node_connectivity
, and the edges
variable is the edge_node_connectivity
.
@koldunovn Thanks a lot for the initiative. I know that not everyone in the FESOM world is a great fan of CDO. Yet, it would be great, towards deriving some basic diagnostics, if some common CDO commands would work. This would also address the quite large community of CDO users who then would not have to completely switch from CDO to FESOM- or unstructured-mesh-specific postprocessing tools.
I agree that for specialized analyses, e.g. gradient or transport computations on meshes, these are best done directly on the FESOM side. Yet, does it not often happen that you get hands on some selected FESOM output variables without access to further FESOM-computed diagnostics and need to do some quick analyses?
I am talking of commands of the following kind:
cdo output -fldmean temp.fesom.2000.nc (global layer average)
cdo output -fldmean -mul mask.nc temp.fesom.2000.nc (regional average - mask.nc would identify regions/basins)
cdo output -vertmean temp.fesom.2000.nc (vertical average)
cdo output -fldmean -vertmean temp.fesom.2000.nc (global and vertical average)
cdo output -remapbil,lon=0,lat=0 temp.fesom.2000.nc (interpolation to station location, e.g. for model-observation comparison)
cdo remapbil,r1440x720 temp.fesom.2000.nc temp.fesom.2000_0.25x0.25deg.nc (interpolation to a regular grid)
Not sure how much of this already works. I just had a quick look at
$ cdo output -fldmean -sellevel,2.5 temp.fesom.2000.nc
cdo(1) fldmean: Process started
cdo(2) sellevel: Process started
Warning: Grid cell bounds not available, using constant grid cell area weights!
11.2843
cdo(2) sellevel: Processed 1 variable over 1 timestep.
cdo(1) fldmean: Processed 126858 values from 1 variable over 1 timestep.
cdo output: Processed 1 value from 1 variable over 1 timestep [0.01s 59MB].
so area weighting did not work, but maybe one just has to get fesom.mesh.diag.nc correctly into the command.
so area weighting did not work, but maybe one just has to get fesom.mesh.diag.nc correctly into the command.
I am not up-to-date with the latest developments in the CDOs @christian-stepanek, but I would be totally supprised if the correct interpretation of geospatial information of the FESOM output works with CDOs, as the output does currently not follow any conventions concerning the coordinates.
so area weighting did not work, but maybe one just has to get fesom.mesh.diag.nc correctly into the command.
I am not up-to-date with the latest developments in the CDOs @christian-stepanek, but I would be totally supprised if the correct interpretation of geospatial information of the FESOM output works with CDOs, as the output does currently not follow any conventions concerning the coordinates.
the problem is, that atm there is not link between data variables and their coordinates in the temperature file. the lon/lat are in the mesh file, but without some CF-attributes and their bounds. IMO it's justa a bot of extra meta data to make CDO work.
@koldunovn Thanks a lot for the initiative. I know that not everyone in the FESOM world is a great fan of CDO. Yet, it would be great, towards deriving some basic diagnostics, if some common CDO commands would work. This would also address the quite large community of CDO users who then would not have to completely switch from CDO to FESOM- or unstructured-mesh-specific postprocessing tools.
I am interested in these tools, too - does the model work as post-processor here?
I agree that for specialized analyses, e.g. gradient or transport computations on meshes, these are best done directly on the FESOM side. Yet, does it not often happen that you get hands on some selected FESOM output variables without access to further FESOM-computed diagnostics and need to do some quick analyses?
Exactly, but my point was: It's not that quick and easy as ppl think. how can a tool like CDO know about the discretization of a model?
@Chilipp The area-weighting should work if the appropriate cdo grid-description file is set, se e.g. here how this worked for the CMIP6 output: https://fesom.de/cmip6/work-with-awi-cm-unstructured-data/
One could add the grid description in every file (I think for CMIP6 this was done to make it easiest to use for the community), but it also works by setting -setgrid in a cdo command and one can keep the file sizes considerably smaller this way. Here is the description what needs to be done for a given fesom mesh to generate the grid description file so that even conservative remapping with cdo is directly supported: https://fesom2.readthedocs.io/en/latest/data_processing/data_processing.html#convert-grid-to-netcdf-that-cdo-understands
@koldunovn Thanks a lot for the initiative. I know that not everyone in the FESOM world is a great fan of CDO. Yet, it would be great, towards deriving some basic diagnostics, if some common CDO commands would work. This would also address the quite large community of CDO users who then would not have to completely switch from CDO to FESOM- or unstructured-mesh-specific postprocessing tools.
I am interested in these tools, too - does the model work as post-processor here?
I agree that for specialized analyses, e.g. gradient or transport computations on meshes, these are best done directly on the FESOM side. Yet, does it not often happen that you get hands on some selected FESOM output variables without access to further FESOM-computed diagnostics and need to do some quick analyses?
Exactly, but my point was: It's not that quick and easy as ppl think. how can a tool like CDO know about the discretization of a model?
CDO normally derives this information from a "grid description file". For "a bit more structured" grids the command chain would look somehow like that:
cdo output -fldmean output_raw.nc (at this point incorrect area weighting would be applied)
cdo setgrid,griddes.nc output_raw.nc output_post.nc
cdo output -fldmean output_post.nc (at this point correct area weighting would be applied)
Here is a link to the user documentation of the setgrid operator: https://code.mpimet.mpg.de/projects/cdo/embedded/index.html#x1-2640002.6.6 Maybe one can also deal with unstructured meshes that way. This would be something to discuss.
CDO does fully support ICON, which has a couple of things in common with the FESOM grid. It can also work with other unstructured models like NICAM. With a bit extra meta-data in the output and the mesh file, CDO for sure can process FESOM without problems. I see CF as a low hanging fruit wrt what you have uploaded in the tar file
@Chilipp The area-weighting should work if the appropriate cdo grid-description file is set, se e.g. here how this worked for the CMIP6 output: https://fesom.de/cmip6/work-with-awi-cm-unstructured-data/
One could add the grid description in every file (I think for CMIP6 this was done to make it easiest to use for the community), but it also works by setting -setgrid in a cdo command and one can keep the file sizes considerably smaller this way. Here is the description what needs to be done for a given fesom mesh to generate the grid description file so that even conservative remapping with cdo is directly supported: https://fesom2.readthedocs.io/en/latest/data_processing/data_processing.html#convert-grid-to-netcdf-that-cdo-understands
Well, then one has to just get R into the workflow. I think it would be great if this was somehow included into the workflow of every simulation, so that one can include the grid description into data publications and people without access to the mesh definition can still do CDO analyses.
Well, then one has to just get R into the workflow. I think it would be great if this was somehow included into the workflow of every simulation, so that one can include the grid description into data publications and people without access to the mesh definition can still do CDO analyses.
If you think about netCDF output of your model, why not letting the model write its grid to disk in netcdf? make it CF and CDO can use it right away with setgrid
.
Hello together! Thanks @koldunovn for opening up the discussion, and it's nice to interact with everyone on one page :-)
What UGRID can offer in addition to this (again 'basic output') is a naming convention to identify geometric objects: points/vertex = nodes, cells = faces, ... so that users have a clue where the temperature points are defined.
This would already be a useful step, and giving people just one file to determine that sort of info instead of several different ones would cut away at least some headaches. Christian's latest comment I think also goes in that direction.
Are there any particular pain points to be aware of here? File size perhaps? Dumping more metadata into the output is (aside from the man-hours needed for coding) not too big of an issue, I would think.
@christian-stepanek I would have been very very surprised if you got the right answer directly out of the box with your cdo fldmean
. Having a cdo -setgrid
capable file would of course be extremely useful, as @Try2Code mentioned.
One addition: doing cdo -setgrid <in> <out>
as a post-processing step would then still be needed, maybe we can have that already happen in fesom...
One addition: doing
cdo -setgrid <in> <out>
as a post-processing step would then still be needed, maybe we can have that already happen in fesom...
space wise that does not seem to be useful, because you add the same coordinate in all output files. esp in high resolution the coordinates and bounds can be a costly thing. At least for ICON the usage of setgrid
before doing any coordinate-related operations has proven to be useful/doable for users there.
I am really glad that this issue cause this very useful discussion. I will try to answer in more detail later, but for the time, here are a few answers and points to clarify:
@Chilipp It would be great if you can provide your suggestions in a separate issue, and we can link it here. If it's just a few metadata changes to make the output UGRID that would be perfect.
@Try2Code I agree with most of what you say. There is no way we going to add grid information to the output, and will try to make it UGRID complaint to the point, when you can add grid data at a latter stage (in cdo
or xarray
). There is lot of things in postprocessing, that you can do having the grid information in hand, but this is not the business of general purpose tools like, again cdo
or xarray
, and should be specific for the model. We have https://github.com/FESOM/pyfesom2 , but there is not much yet that rely on the peculiarities of discretization. There is also https://github.com/FESOM/spheRlab . We also have fortran based post-processing tools that are generally mini-FESOM, but this can and will be replaced by python version.
@christian-stepanek My personal relations with cdo
does not matter :), it is and will be one of the main post-processing tools we will try to target, as a lot of users want it :) We also should be able to generate information, that is now created by spheRlab (summon @helgegoessling here :)) as part of what is now fesom.mesh.diag.nc
, here I agree with @Try2Code .
@pgierz I would do it as a post-processing step, you don't want 33M grid to be present in every file :) Good to know, that according to @Try2Code it's not a big deal for users.
My summary so far:
setgrid
in cdo
, and adding grid data in xarray
straight away.Do I miss something critical here?
Adding @suvarchal and @helgegoessling , sorry not doing this in the first place! @suvarchal do we have problems from xarray side with all this?
My summary so far:
Definitely CF compliant output. make UGRID compliant as much as possible without adding grid data. grid file is separate, and can be used for setgrid in cdo, and adding grid data in xarray straight away. Do I miss something critical here?
From my viewpoint, that seems to summarize everything nicely. And yes, it seems that due to my long days working with the data-cheap paleo models have induced the need for a bit of mental updating. I still need to get a better feeling for the space numbers ;-)
To the xarray
question, I have not run into any weird problems recently, but maybe we could do a brief meeting together with @suvarchal and @koldunovn to think about any changes that may be needed in pyfesom
(or I make an issue for that). I'd be happier passing around xarray.DataArray
instead of plain numpy...
- grid file is separate, and can be used for
setgrid
incdo
, and adding grid data inxarray
straight away. I fully agree that having one separate file with the full grid description is sufficient and would allow to keep the model output as small as possible. That is by the way consistent with the "CMIP" approach, where each model simulation is published with a number of "static" fx files that contain the information necessary to further postprocess and interpret model output.
We now at the point of FESOM2 development, where one can fix a lot of things without being backward compatible (too many changes anyway). So it's a good time to think about the best way to reorganise general FESOM2 output. There are not too much things we can do (netCDF is not going anywhere :)), but the main goals from my perspective are:
xarray
(if needed, to me we are doing fine here :))cdo
as possibleTo start the conversation I have created the collection of current basic output, that you can find here: https://swift.dkrz.de/v1/dkrz_035d8f6ff058403bb42f8302e6badfbc/FESOM2.2_output/FESOM2.2_output.tar
fesom.mesh.diag.nc
- file with all the mesh information. Something that we might need to expand/modify heavillysst.fesom.2000.nc
- 2D variable on verticestemp.fesom.2000.nc
- 3D scalar variable on verticesu.fesom.2000.nc
- 3D vector variables on verticesv.fesom.2000.nc
- 3D vector variable on verticesvice.fesom.2000.nc
- 2D vector variable on nodesuice.fesom.2000.nc
- 2D vector variable on nodestx_sur.fesom.2000.nc
- 2D vector variable on verticesty_sur.fesom.2000.nc
- 2D vector variable on verticescore
folder - original FESOM2 mesh in ascii. For post-processing we would like to replace it by netCDFfesom.mesh.diag.nc
as much as possible.We already have several suggestions in issues under refactoring label.
This issue is to collect suggestions on what is needed and how to implement it. I tag people who might be interested to try current FESOM2 output and make suggestions. We would really appreciate your thoughts and advices on the matter, @pgierz @dsidoren @patrickscholz @JanStreffing @mandresm @tsemmler05 @trackow @christian-stepanek @Chilipp @Try2Code @aaronspring
Please feel free to tag more people (from paleo, for example).