Update FESOM2 output - Githubissues

koldunovn commented 2 years ago

We now at the point of FESOM2 development, where one can fix a lot of things without being backward compatible (too many changes anyway). So it's a good time to think about the best way to reorganise general FESOM2 output. There are not too much things we can do (netCDF is not going anywhere :)), but the main goals from my perspective are:

make it UGRID compliant
make it easier to work with xarray (if needed, to me we are doing fine here :))
make it as easy to work with cdo as possible
make future CMORisation as easy as possible
make it possible to work with https://github.com/psyplot/psyplot

To start the conversation I have created the collection of current basic output, that you can find here: https://swift.dkrz.de/v1/dkrz_035d8f6ff058403bb42f8302e6badfbc/FESOM2.2_output/FESOM2.2_output.tar

fesom.mesh.diag.nc - file with all the mesh information. Something that we might need to expand/modify heavilly
sst.fesom.2000.nc - 2D variable on vertices
temp.fesom.2000.nc - 3D scalar variable on vertices
u.fesom.2000.nc - 3D vector variables on vertices
v.fesom.2000.nc - 3D vector variable on vertices
vice.fesom.2000.nc - 2D vector variable on nodes
uice.fesom.2000.nc - 2D vector variable on nodes
tx_sur.fesom.2000.nc - 2D vector variable on vertices
ty_sur.fesom.2000.nc - 2D vector variable on vertices
core folder - original FESOM2 mesh in ascii. For post-processing we would like to replace it by netCDF fesom.mesh.diag.nc as much as possible.

We already have several suggestions in issues under refactoring label.

This issue is to collect suggestions on what is needed and how to implement it. I tag people who might be interested to try current FESOM2 output and make suggestions. We would really appreciate your thoughts and advices on the matter, @pgierz @dsidoren @patrickscholz @JanStreffing @mandresm @tsemmler05 @trackow @christian-stepanek @Chilipp @Try2Code @aaronspring

Please feel free to tag more people (from paleo, for example).

Chilipp commented 2 years ago

hey @koldunovn! this is great! So the only critical thing for psyplot is to have it UGRID-compliant (although I still did not yet had the time to make this in the production release, but I am happy to prioritize this if you are interested in this). I had a look into the netCDF files and from my perspective, it's just about adding some metadata attributes. How can I suggest the changes to you? shall I make a separate issue?

Try2Code commented 2 years ago

@koldunovn thx for the opportunity to take part in the discussion. I have a bit mixed feelings regarding UGRID for normal output, because it adds complexity where it is not really needed:

I consider model output as something you can use for

plotting
interpolation
basis analysis like averages, variances (everything that is reasonable outside the model)

I regard outside the model, where you don't have geometric coefficients to compute something like a gradient or a rotation. These things should be handled inside the model, because it (a) has all the information and (b) can make sure things are computed in the exact same way they are simulated. Sorry if this goes into the wrong direction, but I was asked too often for a gradient on a ICON grid computed in CDO ... For the above points I believe the CF-convention is enough. At least I have positive experience with respect to ICON using CF-conform output. In very high resolution the coordinates are usually not written to every output file, but I think this is a reasonable approach to save some disk space and tools should account for this when they want to support high resolution. In the example netcdf output you provided in the tar file, there is not much to be added for being CF-compatible:

your data variables need a coordinates attribute
these coordinate should be in the respective with using the same dimension (nod2)
if you want to be able to do conservative interpolation, you need to add the bounds of the coordinates, too

What UGRID can offer in addition to this (again 'basic output') is a naming convention to identify geometric objects: points/vertex = nodes, cells = faces, ... so that users have a clue where the temperature points are defined.

BUT

This only scratches the surface of what UGRID is about. From what I understand UGRID is meant to be a representation of the geometric objects and their relations, i.e. connectivity of your grids. Although is very helpful IMO to write a grid file in UGRID in a way where ppl dont need massive knowledge about the model to understand its contents, it does not seem very useful to add this information to the normal model output.

UGRID certainly has it's limitations when it comes to what you do with the grids inside the model: grid != discretization.

So if it comes to UGRID vs. CF:

Why do you want UGRID compatibility in the first place?
For cdo, xarray, psyplot a CF-conform input is fine
CMOR is a bit crazy, but CF is at least a good starting point. I see CMOR as a postprocessing step

Sorry for the wall of text. IMO UGRID is a big step and before that I would consider the possible benefits, like the tools that you could use having UGRID. Are there any? From my experience even tools for CF-compatible files on unstructured grids are very very limited.

Chilipp commented 2 years ago

hey @Try2Code! one issue is that you can derive ICON-like CF-Conventions from UGRID, but the other way around is quite complicated, isn't it? And the current FESOM output is already pretty much UGRID conform, there are just a few metadata attributes missing, and the connectivity variables need to be transposed.

But can't you actually do both? You can have a mesh attribute for the UGRID conventions, and a coordinates attribute with the corresponding bounds for the CF-Conventions. then people can choose whatever the need.

Try2Code commented 2 years ago

hey @Try2Code! one issue is that you can derive ICON-like CF-Conventions from UGRID, but the other way around is quite complicated, isn't it?

CF does not distinguish between face and vertex - so it's additional information

But can't you actually do both? You can have a mesh attribute for the UGRID conventions, and a coordinates attribute with the corresponding bounds for the CF-Conventions. then people can choose whatever the need.

I think thats possible, but why having UGRID without giving the connectivity in the normal output? in this case CF is enough IMO

Chilipp commented 2 years ago

but why having UGRID without giving the connectivity in the normal output?

you definitely need to give the connectivity to be UGRID compliant. But this is part of fesom.mesh.diag.nc. As far as I understand, the elements in this file is the face_node_connectivity, and the edges variable is the edge_node_connectivity.

christian-stepanek commented 2 years ago

@koldunovn Thanks a lot for the initiative. I know that not everyone in the FESOM world is a great fan of CDO. Yet, it would be great, towards deriving some basic diagnostics, if some common CDO commands would work. This would also address the quite large community of CDO users who then would not have to completely switch from CDO to FESOM- or unstructured-mesh-specific postprocessing tools.

I agree that for specialized analyses, e.g. gradient or transport computations on meshes, these are best done directly on the FESOM side. Yet, does it not often happen that you get hands on some selected FESOM output variables without access to further FESOM-computed diagnostics and need to do some quick analyses?

I am talking of commands of the following kind:

cdo output -fldmean temp.fesom.2000.nc (global layer average)
cdo output -fldmean -mul mask.nc temp.fesom.2000.nc (regional average - mask.nc would identify regions/basins)
cdo output -vertmean temp.fesom.2000.nc (vertical average)
cdo output -fldmean -vertmean temp.fesom.2000.nc (global and vertical average)
cdo output -remapbil,lon=0,lat=0 temp.fesom.2000.nc (interpolation to station location, e.g. for model-observation comparison)
cdo remapbil,r1440x720 temp.fesom.2000.nc temp.fesom.2000_0.25x0.25deg.nc (interpolation to a regular grid)

Not sure how much of this already works. I just had a quick look at

$ cdo output -fldmean -sellevel,2.5 temp.fesom.2000.nc 
cdo(1) fldmean: Process started
cdo(2) sellevel: Process started
Warning: Grid cell bounds not available, using constant grid cell area weights!
      11.2843
cdo(2) sellevel: Processed 1 variable over 1 timestep.
cdo(1) fldmean: Processed 126858 values from 1 variable over 1 timestep.
cdo    output: Processed 1 value from 1 variable over 1 timestep [0.01s 59MB].

so area weighting did not work, but maybe one just has to get fesom.mesh.diag.nc correctly into the command.

Chilipp commented 2 years ago

so area weighting did not work, but maybe one just has to get fesom.mesh.diag.nc correctly into the command.

I am not up-to-date with the latest developments in the CDOs @christian-stepanek, but I would be totally supprised if the correct interpretation of geospatial information of the FESOM output works with CDOs, as the output does currently not follow any conventions concerning the coordinates.

Try2Code commented 2 years ago

so area weighting did not work, but maybe one just has to get fesom.mesh.diag.nc correctly into the command.

I am not up-to-date with the latest developments in the CDOs @christian-stepanek, but I would be totally supprised if the correct interpretation of geospatial information of the FESOM output works with CDOs, as the output does currently not follow any conventions concerning the coordinates.

the problem is, that atm there is not link between data variables and their coordinates in the temperature file. the lon/lat are in the mesh file, but without some CF-attributes and their bounds. IMO it's justa a bot of extra meta data to make CDO work.

Try2Code commented 2 years ago

@koldunovn Thanks a lot for the initiative. I know that not everyone in the FESOM world is a great fan of CDO. Yet, it would be great, towards deriving some basic diagnostics, if some common CDO commands would work. This would also address the quite large community of CDO users who then would not have to completely switch from CDO to FESOM- or unstructured-mesh-specific postprocessing tools.

I am interested in these tools, too - does the model work as post-processor here?

I agree that for specialized analyses, e.g. gradient or transport computations on meshes, these are best done directly on the FESOM side. Yet, does it not often happen that you get hands on some selected FESOM output variables without access to further FESOM-computed diagnostics and need to do some quick analyses?

Exactly, but my point was: It's not that quick and easy as ppl think. how can a tool like CDO know about the discretization of a model?

trackow commented 2 years ago

@Chilipp The area-weighting should work if the appropriate cdo grid-description file is set, se e.g. here how this worked for the CMIP6 output: https://fesom.de/cmip6/work-with-awi-cm-unstructured-data/

One could add the grid description in every file (I think for CMIP6 this was done to make it easiest to use for the community), but it also works by setting -setgrid in a cdo command and one can keep the file sizes considerably smaller this way. Here is the description what needs to be done for a given fesom mesh to generate the grid description file so that even conservative remapping with cdo is directly supported: https://fesom2.readthedocs.io/en/latest/data_processing/data_processing.html#convert-grid-to-netcdf-that-cdo-understands

christian-stepanek commented 2 years ago

@koldunovn Thanks a lot for the initiative. I know that not everyone in the FESOM world is a great fan of CDO. Yet, it would be great, towards deriving some basic diagnostics, if some common CDO commands would work. This would also address the quite large community of CDO users who then would not have to completely switch from CDO to FESOM- or unstructured-mesh-specific postprocessing tools.

I am interested in these tools, too - does the model work as post-processor here?

I agree that for specialized analyses, e.g. gradient or transport computations on meshes, these are best done directly on the FESOM side. Yet, does it not often happen that you get hands on some selected FESOM output variables without access to further FESOM-computed diagnostics and need to do some quick analyses?

Exactly, but my point was: It's not that quick and easy as ppl think. how can a tool like CDO know about the discretization of a model?

CDO normally derives this information from a "grid description file". For "a bit more structured" grids the command chain would look somehow like that:

cdo output -fldmean output_raw.nc (at this point incorrect area weighting would be applied)
cdo setgrid,griddes.nc output_raw.nc output_post.nc
cdo output -fldmean output_post.nc (at this point correct area weighting would be applied)

Here is a link to the user documentation of the setgrid operator: https://code.mpimet.mpg.de/projects/cdo/embedded/index.html#x1-2640002.6.6 Maybe one can also deal with unstructured meshes that way. This would be something to discuss.

Try2Code commented 2 years ago

CDO does fully support ICON, which has a couple of things in common with the FESOM grid. It can also work with other unstructured models like NICAM. With a bit extra meta-data in the output and the mesh file, CDO for sure can process FESOM without problems. I see CF as a low hanging fruit wrt what you have uploaded in the tar file

christian-stepanek commented 2 years ago

@Chilipp The area-weighting should work if the appropriate cdo grid-description file is set, se e.g. here how this worked for the CMIP6 output: https://fesom.de/cmip6/work-with-awi-cm-unstructured-data/

One could add the grid description in every file (I think for CMIP6 this was done to make it easiest to use for the community), but it also works by setting -setgrid in a cdo command and one can keep the file sizes considerably smaller this way. Here is the description what needs to be done for a given fesom mesh to generate the grid description file so that even conservative remapping with cdo is directly supported: https://fesom2.readthedocs.io/en/latest/data_processing/data_processing.html#convert-grid-to-netcdf-that-cdo-understands

Well, then one has to just get R into the workflow. I think it would be great if this was somehow included into the workflow of every simulation, so that one can include the grid description into data publications and people without access to the mesh definition can still do CDO analyses.

Try2Code commented 2 years ago

Well, then one has to just get R into the workflow. I think it would be great if this was somehow included into the workflow of every simulation, so that one can include the grid description into data publications and people without access to the mesh definition can still do CDO analyses.

If you think about netCDF output of your model, why not letting the model write its grid to disk in netcdf? make it CF and CDO can use it right away with setgrid.

pgierz commented 2 years ago

Hello together! Thanks @koldunovn for opening up the discussion, and it's nice to interact with everyone on one page :-)

What UGRID can offer in addition to this (again 'basic output') is a naming convention to identify geometric objects: points/vertex = nodes, cells = faces, ... so that users have a clue where the temperature points are defined.

This would already be a useful step, and giving people just one file to determine that sort of info instead of several different ones would cut away at least some headaches. Christian's latest comment I think also goes in that direction.

Are there any particular pain points to be aware of here? File size perhaps? Dumping more metadata into the output is (aside from the man-hours needed for coding) not too big of an issue, I would think.

@christian-stepanek I would have been very very surprised if you got the right answer directly out of the box with your cdo fldmean. Having a cdo -setgrid capable file would of course be extremely useful, as @Try2Code mentioned.

pgierz commented 2 years ago

One addition: doing cdo -setgrid <in> <out> as a post-processing step would then still be needed, maybe we can have that already happen in fesom...

Try2Code commented 2 years ago

One addition: doing cdo -setgrid <in> <out> as a post-processing step would then still be needed, maybe we can have that already happen in fesom...

space wise that does not seem to be useful, because you add the same coordinate in all output files. esp in high resolution the coordinates and bounds can be a costly thing. At least for ICON the usage of setgrid before doing any coordinate-related operations has proven to be useful/doable for users there.

koldunovn commented 2 years ago

I am really glad that this issue cause this very useful discussion. I will try to answer in more detail later, but for the time, here are a few answers and points to clarify:

@Chilipp It would be great if you can provide your suggestions in a separate issue, and we can link it here. If it's just a few metadata changes to make the output UGRID that would be perfect.
@Try2Code I agree with most of what you say. There is no way we going to add grid information to the output, and will try to make it UGRID complaint to the point, when you can add grid data at a latter stage (in cdo or xarray). There is lot of things in postprocessing, that you can do having the grid information in hand, but this is not the business of general purpose tools like, again cdo or xarray, and should be specific for the model. We have https://github.com/FESOM/pyfesom2 , but there is not much yet that rely on the peculiarities of discretization. There is also https://github.com/FESOM/spheRlab . We also have fortran based post-processing tools that are generally mini-FESOM, but this can and will be replaced by python version.
@christian-stepanek My personal relations with cdo does not matter :), it is and will be one of the main post-processing tools we will try to target, as a lot of users want it :) We also should be able to generate information, that is now created by spheRlab (summon @helgegoessling here :)) as part of what is now fesom.mesh.diag.nc, here I agree with @Try2Code .
@pgierz I would do it as a post-processing step, you don't want 33M grid to be present in every file :) Good to know, that according to @Try2Code it's not a big deal for users.

My summary so far:

Definitely CF compliant output.
make UGRID compliant as much as possible without adding grid data.
grid file is separate, and can be used for setgrid in cdo, and adding grid data in xarray straight away.

Do I miss something critical here?

Adding @suvarchal and @helgegoessling , sorry not doing this in the first place! @suvarchal do we have problems from xarray side with all this?

pgierz commented 2 years ago

My summary so far:

Definitely CF compliant output. make UGRID compliant as much as possible without adding grid data. grid file is separate, and can be used for setgrid in cdo, and adding grid data in xarray straight away. Do I miss something critical here?

From my viewpoint, that seems to summarize everything nicely. And yes, it seems that due to my long days working with the data-cheap paleo models have induced the need for a bit of mental updating. I still need to get a better feeling for the space numbers ;-)

To the xarray question, I have not run into any weird problems recently, but maybe we could do a brief meeting together with @suvarchal and @koldunovn to think about any changes that may be needed in pyfesom (or I make an issue for that). I'd be happier passing around xarray.DataArray instead of plain numpy...

christian-stepanek commented 2 years ago

grid file is separate, and can be used for setgrid in cdo, and adding grid data in xarray straight away. I fully agree that having one separate file with the full grid description is sufficient and would allow to keep the model output as small as possible. That is by the way consistent with the "CMIP" approach, where each model simulation is published with a number of "static" fx files that contain the information necessary to further postprocess and interpret model output.

FESOM / fesom2

Update FESOM2 output #270