FESOM / pyfesom2

FESOM2 tools https://pyfesom2.readthedocs.io
MIT License
14 stars 10 forks source link

Mesh Representation #155

Open pgierz opened 2 years ago

pgierz commented 2 years ago

In #153, we are discussing how to handle preparing the pyfesom2 analysis tools for large data. One particular need arises here: how to handle (in future), the mesh representation on the Python side.

Several options exist, as far as I see (please feel free to edit the main issue and add more info, if anyone has something to add)

A. We continue with plain text files.

Pro: It's already there, many users are used to it, we get to feel retro and old-school as if we are still in the 80s when computing was still the wild west.

Con: ....it's plain text, and feels rather old-school and is if we are still in the wild west. There are smarter ways of doing it by now.

B. Switch to a NetCDF Format

Pro: We can self-document all of the mesh in the file itself. We know which part of the plain text file (which, is now a netcdf file) shows nodes, Lon/lat, ocean/coastline, etc etc.

Con: We need user migration. This would also imply a deeper switch inside of FESOM itself.

C. Hybrid Model (Paul's preference right now)

We allow the user to read in a "plain text" mesh, and check if there is a NetCDF version there. If not, we make one "on-the-fly" for the next time around. This would speed up user migration to the new format.

Pro: We could, for a time, support both plain text and new netcdf meshes.

Con: It might take a bit of programming flexibility, but if we decide on a strategy, I am happy to volunteer the dirty part of that work.

One more thought:

I would, however, argue against including all the mesh information directly in the model output (this was something I had wished for earlier, but it would explode the file size). Rather, I would suggest that we find some way to publish all of our meshes publicly (if that is feasible), and provide metadata in the output files where the mesh can be accessed (via FTP, Git LFS, DKRZ Swift, whatever). We already have the mesh path in one of the namelists, it should not be too hard to just dump an extra line of metadata into the files when writing output. Then, when a user loads a particular dataset, whatever we have as a load function can check for this, and also load the accompanying information which may be needed for other operations.

pgierz commented 2 years ago

In the long run, I think it would make most sense to switch to NetCDF, but provide a smooth transition, both on the actual model side as well as on the analysis side. Such a fundamental change may require us to signal it to everyone with a version number change (I guess a minor one, both for FESOM itself, and for PyFesom)

We should also include in this discussion anyone who is actively generating new meshes (e.g. the paleo guys, who want exotic things like Cretaceous), since this will affect many workflows.

koldunovn commented 2 years ago

One more plus of the plain text if that one can begin to analyse them before the first run is done and fesom.mesh.diag.nc is generated. But at the end of the day I am all for the netCDF of course.

Still not clear to me if we should use:

pyfesom2.function(data, mesh)

or

pyfesom2.function(data)

With the mesh glued to the data in the second case. If we go for the hybrid model, I think the first variant is still a way to go? But using modified mesh when the region is cutted, as implemented by @suvarchal is also very tempting :)

Metadata with locations of meshes is a good idea. I would keep them on awi GitLab, and have link to concrete hash. Not sure if this should be the namelist option or something set automatically inside FESOM2 (I currently lean towards the namelist).

koldunovn commented 2 years ago

@pgierz can you ping some paleo people, so that they can express their concerns/wishes here?

pgierz commented 2 years ago

Done via email, since I am unsure if everyone in the paleodyn group uses github. I will copy/paste answers here if need be.

christian-stepanek commented 2 years ago

I would suggest Paul's option C - implement the ability to read in mesh information via NetCDF, but also keep the traditional text-based format usable.

NetCDF is a bit more easy to handle for the average modeller today. Maybe NetCDF will also make it more easy to do tricks like activating or deactivating mesh nodes with relatively simple script based methods? Just a guess, maybe the "text-based" Gurus know more on that.

pgierz commented 2 years ago

sed/grep/awk, Christian ;-)

then, afterward, wonder why your file is suddenly messed up because you swapped a double quote for a single quote and this then induces chaos.

So, one vote plus for NetCDF. (or, two: I vote that way as well)

christian-stepanek commented 2 years ago

Yes, there is sed/gep/awk. But, traditionally, I prefer the ability to do things like cdo setcindexbox, maskindexbox, etc. as a more convenient method. ;o)

If one anyway goes for NetCDF as an additional pathway to do the mesh description, maybe then it would be easy (or more easily) possible to create the traditional "fx" files that help users identify general metrics of their model setup, like CMIP6.CMIP.AWI.AWI-ESM-1-1-LR.piControl.r1i1p1f1.fx.areacella.gn, but for the ocean.

pgierz commented 2 years ago

Christian, could you share a ncdump -h of such an fx file? that would be good for reference...

christian-stepanek commented 2 years ago

Here it is for three different fx-variables that are available from CMIP6 for the AWI-CM-1-1-MR simulation piControl:

netcdf deptho_Ofx_AWI-CM-1-1-MR_piControl_r1i1p1f1_gn {
dimensions:
    ncells = 830305 ;
    vertices = 16 ;
variables:
    float deptho(ncells) ;
        deptho:units = "m" ;
        deptho:_FillValue = 1.e+30f ;
        deptho:missing_value = 1.e+30f ;
        deptho:coordinates = "lat lon" ;
        deptho:standard_name = "sea_floor_depth_below_geoid" ;
        deptho:description = "Ocean bathymetry.   Reported here is the sea floor depth for present day relative to z=0 geoid. Reported as missing for land grid cells." ;
        deptho:cell_methods = "area: mean where sea" ;
        deptho:cell_measures = "area: areacello" ;
    double lat(ncells) ;
        lat:units = "degrees_north" ;
        lat:standard_name = "latitude" ;
        lat:bounds = "lat_bnds" ;
    double lat_bnds(ncells, vertices) ;
    double lon(ncells) ;
        lon:units = "degrees_east" ;
        lon:standard_name = "longitude" ;
        lon:bounds = "lon_bnds" ;
    double lon_bnds(ncells, vertices) ;

// global attributes:
        :NCO = "netCDF Operators version 4.7.5 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
        :activity_id = "CMIP" ;
        :Conventions = "CF-1.7 CMIP-6.2" ;
        :creation_date = "2018-12-18T12:00:00Z" ;
        :data_specs_version = "01.00.27" ;
        :experiment = "piControl" ;
        :experiment_id = "piControl" ;
        :forcing_index = 1 ;
        :frequency = "fx" ;
        :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.AWI.AWI-CM-1-1-MR.piControl.none.r1i1p1f1" ;
        :grid = "FESOM 1.4 (unstructured grid in the horizontal with 830305 wet nodes; 46 levels; top grid cell 0-5 m)" ;
        :grid_label = "gn" ;
        :initialization_index = 1 ;
        :institution = "Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany" ;
        :institution_id = "AWI" ;
        :license = "CMIP6 model data produced by Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file). The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
        :mip_era = "CMIP6" ;
        :nominal_resolution = "25 km" ;
        :physics_index = 1 ;
        :product = "model-output" ;
        :realization_index = 1 ;
        :realm = "ocean" ;
        :source = "AWI-CM-1-1-MR" ;
        :source_id = "AWI-CM-1-1-MR" ;
        :source_type = "AOGCM" ;
        :sub_experiment = "none" ;
        :sub_experiment_id = "none" ;
        :table_id = "Ofx" ;
        :tracking_id = "hdl:21.14100/69fa3486-ab20-4aa3-b808-d2687da126ac" ;
        :variable_id = "deptho" ;
        :variant_label = "r1i1p1f1" ;
        :branch_method = "standard" ;
        :branch_time_in_child = 0. ;
        :branch_time_in_parent = 182622. ;
        :parent_activity_id = "CMIP" ;
        :parent_experiment_id = "piControl-spinup" ;
        :parent_mip_era = "CMIP6" ;
        :parent_source_id = "AWI-CM-1-1-MR" ;
        :parent_time_units = "days since 1901-1-1" ;
        :parent_variant_label = "r1i1p1f1" ;
}
netcdf areacello_Ofx_AWI-CM-1-1-MR_piControl_r1i1p1f1_gn {
dimensions:
    ncells = 830305 ;
    vertices = 16 ;
variables:
    double areacello(ncells) ;
        areacello:units = "m2" ;
        areacello:CDI_grid_type = "unstructured" ;
        areacello:_FillValue = 1.00000001504747e+30 ;
        areacello:missing_value = 1.00000001504747e+30 ;
        areacello:coordinates = "lat lon" ;
        areacello:standard_name = "cell_area" ;
        areacello:description = "Horizontal area of ocean grid cells" ;
        areacello:cell_methods = "area: sum" ;
        areacello:cell_measures = "" ;
    double lat(ncells) ;
        lat:units = "degrees_north" ;
        lat:standard_name = "latitude" ;
        lat:bounds = "lat_bnds" ;
    double lat_bnds(ncells, vertices) ;
    double lon(ncells) ;
        lon:units = "degrees_east" ;
        lon:standard_name = "longitude" ;
        lon:bounds = "lon_bnds" ;
    double lon_bnds(ncells, vertices) ;

// global attributes:
        :NCO = "netCDF Operators version 4.9.3 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
        :activity_id = "CMIP" ;
        :Conventions = "CF-1.7 CMIP-6.2" ;
        :creation_date = "2018-12-18T12:00:00Z" ;
        :data_specs_version = "01.00.27" ;
        :experiment = "piControl" ;
        :experiment_id = "piControl" ;
        :forcing_index = 1 ;
        :frequency = "fx" ;
        :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.AWI.AWI-CM-1-1-MR.piControl.none.r1i1p1f1" ;
        :grid = "FESOM 1.4 (unstructured grid in the horizontal with 830305 wet nodes; 46 levels; top grid cell 0-5 m)" ;
        :grid_label = "gn" ;
        :initialization_index = 1 ;
        :institution = "Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany" ;
        :institution_id = "AWI" ;
        :license = "CMIP6 model data produced by Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file). The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
        :mip_era = "CMIP6" ;
        :nominal_resolution = "25 km" ;
        :physics_index = 1 ;
        :product = "model-output" ;
        :realization_index = 1 ;
        :realm = "ocean" ;
        :source = "AWI-CM-1-1-MR" ;
        :source_id = "AWI-CM-1-1-MR" ;
        :source_type = "AOGCM" ;
        :sub_experiment = "none" ;
        :sub_experiment_id = "none" ;
        :table_id = "Ofx" ;
        :tracking_id = "hdl:21.14100/1da0d7a6-1775-4bf4-b7b2-681bc8012ab6" ;
        :variable_id = "areacello" ;
        :variant_label = "r1i1p1f1" ;
        :branch_method = "standard" ;
        :branch_time_in_child = 0. ;
        :branch_time_in_parent = 182622. ;
        :parent_activity_id = "CMIP" ;
        :parent_experiment_id = "piControl-spinup" ;
        :parent_mip_era = "CMIP6" ;
        :parent_source_id = "AWI-CM-1-1-MR" ;
        :parent_time_units = "days since 1901-1-1" ;
        :parent_variant_label = "r1i1p1f1" ;
}
netcdf thkcello_Ofx_AWI-CM-1-1-MR_piControl_r1i1p1f1_gn {
dimensions:
    ncells = 830305 ;
    nlev = 46 ;
    vertices = 16 ;
variables:
    float thkcello(nlev, ncells) ;
        thkcello:units = "m" ;
        thkcello:coordinates = "lat lon" ;
        thkcello:standard_name = "cell_thickness" ;
        thkcello:description = "\'Thickness\' means the vertical extent of a layer. \'Cell\' refers to a model grid-cell." ;
        thkcello:cell_methods = "area: mean" ;
        thkcello:cell_measures = "area: areacello volume: volcello" ;
    double lat(ncells) ;
        lat:units = "degrees_north" ;
        lat:standard_name = "latitude" ;
        lat:bounds = "lat_bnds" ;
    double lat_bnds(ncells, vertices) ;
    double lon(ncells) ;
        lon:units = "degrees_east" ;
        lon:standard_name = "longitude" ;
        lon:bounds = "lon_bnds" ;
    double lon_bnds(ncells, vertices) ;

// global attributes:
        :NCO = "netCDF Operators version 4.9.3 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
        :activity_id = "CMIP" ;
        :Conventions = "CF-1.7 CMIP-6.2" ;
        :creation_date = "2018-12-18T12:00:00Z" ;
        :data_specs_version = "01.00.27" ;
        :experiment = "piControl" ;
        :experiment_id = "piControl" ;
        :forcing_index = 1 ;
        :frequency = "fx" ;
        :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.AWI.AWI-CM-1-1-MR.piControl.none.r1i1p1f1" ;
        :grid = "FESOM 1.4 (unstructured grid in the horizontal with 830305 wet nodes; 46 levels; top grid cell 0-5 m)" ;
        :grid_label = "gn" ;
        :initialization_index = 1 ;
        :institution = "Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany" ;
        :institution_id = "AWI" ;
        :license = "CMIP6 model data produced by Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file). The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
        :mip_era = "CMIP6" ;
        :nominal_resolution = "25 km" ;
        :physics_index = 1 ;
        :product = "model-output" ;
        :realization_index = 1 ;
        :realm = "ocean" ;
        :source = "AWI-CM-1-1-MR" ;
        :source_id = "AWI-CM-1-1-MR" ;
        :source_type = "AOGCM" ;
        :sub_experiment = "none" ;
        :sub_experiment_id = "none" ;
        :table_id = "Ofx" ;
        :tracking_id = "hdl:21.14100/02995ba2-7d93-424b-83aa-005d6dfd6a12" ;
        :variable_id = "thkcello" ;
        :variant_label = "r1i1p1f1" ;
        :branch_method = "standard" ;
        :branch_time_in_child = 0. ;
        :branch_time_in_parent = 182622. ;
        :parent_activity_id = "CMIP" ;
        :parent_experiment_id = "piControl-spinup" ;
        :parent_mip_era = "CMIP6" ;
        :parent_source_id = "AWI-CM-1-1-MR" ;
        :parent_time_units = "days since 1901-1-1" ;
        :parent_variant_label = "r1i1p1f1" ;
}
koldunovn commented 2 years ago

Here is for completeness grid description files that are used to work with CDO (generated by https://github.com/FESOM/spheRlab from @helgegoessling )

core2_griddes_nodes ``` netcdf core2_griddes_nodes { dimensions: ncells = 126858 ; vertices = 18 ; nlinks_max = 9 ; Three = 3 ; ntriags = 244659 ; nlev = 47 ; nlev_bnds = 48 ; variables: double lon(ncells) ; lon:units = "degrees_east" ; lon:standard_name = "longitude" ; lon:bounds = "lon_bnds" ; double lon_bnds(ncells, vertices) ; lon_bnds:units = "degrees_east" ; lon_bnds:standard_name = "longitude_bounds" ; lon_bnds:centers = "lon" ; double lat(ncells) ; lat:units = "degrees_north" ; lat:standard_name = "latitude" ; lat:bounds = "lat_bnds" ; double lat_bnds(ncells, vertices) ; lat_bnds:units = "degrees_north" ; lat_bnds:standard_name = "latitude_bounds" ; lat_bnds:centers = "lat" ; double cell_area(ncells) ; cell_area:units = "m2" ; cell_area:_FillValue = -1. ; cell_area:long_name = "area of grid cell" ; cell_area:grid_type = "unstructured" ; cell_area:coordinates = "lat lon" ; int node_node_links(ncells, nlinks_max) ; node_node_links:_FillValue = -1 ; node_node_links:long_name = "Indicates which other nodes neighbour each node." ; int triag_nodes(ntriags, Three) ; triag_nodes:_FillValue = -1 ; triag_nodes:long_name = "Maps every triangular face to its three corner nodes." ; int coast(ncells) ; coast:_FillValue = -1 ; coast:long_name = "Indicates coastal nodes: coast=1, internal=0" ; coast:grid_type = "unstructured" ; coast:coordinates = "lat lon" ; double depth(nlev) ; depth:units = "m" ; depth:_FillValue = -1. ; depth:long_name = "depth of model levels in metres (positive downwards)" ; double depth_bnds(nlev_bnds) ; depth_bnds:units = "m" ; depth_bnds:_FillValue = -1. ; depth_bnds:long_name = "depth of model level bounds in metres (positive downwards)" ; int depth_lev(ncells) ; depth_lev:_FillValue = -1 ; depth_lev:long_name = "depth in terms of number of active levels beneath each ocean surface grid point" ; depth_lev:grid_type = "unstructured" ; depth_lev:coordinates = "lat lon" ; // global attributes: :Conventions = "CF-1.4" ; :history = "2020-10-14 06:20:34 GMT; Grid description file generated with spheRlab version 1.1.5; Grid read and converted with: sl.grid.readFESOM(griddir = grd.paths[i]); Grid written with: sl.grid.writeCDO(grd, paste0(out.path, \"/\", grd.names[i], \"_griddes_nodes.nc\"), overwrite = TRUE, ofile.ZAXIS = NULL)" ; ```
core2_griddes_elements.nc ``` netcdf core2_griddes_elements { dimensions: ntriags = 244659 ; Three = 3 ; nlinks_max = 9 ; ncells = 126858 ; nlev = 47 ; nlev_bnds = 48 ; variables: double lon(ntriags) ; lon:units = "degrees_east" ; lon:standard_name = "longitude" ; lon:bounds = "lon_bnds" ; double lon_bnds(ntriags, Three) ; lon_bnds:units = "degrees_east" ; lon_bnds:standard_name = "longitude_bounds" ; lon_bnds:centers = "lon" ; double lat(ntriags) ; lat:units = "degrees_north" ; lat:standard_name = "latitude" ; lat:bounds = "lat_bnds" ; double lat_bnds(ntriags, Three) ; lat_bnds:units = "degrees_north" ; lat_bnds:standard_name = "latitude_bounds" ; lat_bnds:centers = "lat" ; double cell_area(ntriags) ; cell_area:units = "m2" ; cell_area:_FillValue = -1. ; cell_area:long_name = "area of grid cell" ; cell_area:grid_type = "unstructured" ; cell_area:coordinates = "lat lon" ; int node_node_links(ncells, nlinks_max) ; node_node_links:_FillValue = -1 ; node_node_links:long_name = "Indicates which other nodes neighbour each node." ; int triag_nodes(ntriags, Three) ; triag_nodes:_FillValue = -1 ; triag_nodes:long_name = "Maps every triangular face to its three corner nodes." ; int coast(ntriags) ; coast:_FillValue = -1 ; coast:long_name = "Indicates coastal triangles: coast=1, internal=0" ; coast:grid_type = "unstructured" ; coast:coordinates = "lat lon" ; double depth(nlev) ; depth:units = "m" ; depth:_FillValue = -1. ; depth:long_name = "depth of model levels in metres (positive downwards)" ; double depth_bnds(nlev_bnds) ; depth_bnds:units = "m" ; depth_bnds:_FillValue = -1. ; depth_bnds:long_name = "depth of model level bounds in metres (positive downwards)" ; int depth_lev(ntriags) ; depth_lev:_FillValue = -1 ; depth_lev:long_name = "depth in terms of number of active levels beneath each ocean surface element" ; depth_lev:grid_type = "unstructured" ; depth_lev:coordinates = "lat lon" ; // global attributes: :Conventions = "CF-1.4" ; :history = "2020-10-14 06:20:34 GMT; Grid description file generated with spheRlab version 1.1.5; Grid read and converted with: sl.grid.readFESOM(griddir = grd.paths[i]); Grid written with: sl.grid.writeCDO(grd, paste0(out.path, \"/\", grd.names[i], \"_griddes_elements.nc\"), fesom2velocities = T, overwrite = TRUE, ofile.ZAXIS = NULL)" ; } ```
suvarchal commented 2 years ago

In #153, we are discussing how to handle preparing the pyfesom2 analysis tools for large data. One particular need arises here: how to handle (in future), the mesh representation on the Python side.

Several options exist, as far as I see (please feel free to edit the main issue and add more info, if anyone has something to add)

A. We continue with plain text files.

Pro: It's already there, many users are used to it, we get to feel retro and old-school as if we are still in the 80s when computing was still the wild west.

Con: ....it's plain text, and feels rather old-school and is if we are still in the wild west. There are smarter ways of doing it by now.

B. Switch to a NetCDF Format

Pro: We can self-document all of the mesh in the file itself. We know which part of the plain text file (which, is now a netcdf file) shows nodes, Lon/lat, ocean/coastline, etc etc.

Con: We need user migration. This would also imply a deeper switch inside of FESOM itself.

C. Hybrid Model (Paul's preference right now)

We allow the user to read in a "plain text" mesh, and check if there is a NetCDF version there. If not, we make one "on-the-fly" for the next time around. This would speed up user migration to the new format.

Pro: We could, for a time, support both plain text and new netcdf meshes.

Con: It might take a bit of programming flexibility, but if we decide on a strategy, I am happy to volunteer the dirty part of that work.

Totally agree with @pgierz to go with hybrid option. One other reason is: text files are less effective to do a lazy loading of mesh, much needed for HR simulations, atleast in current form used. After slight modifications to the data format, mostly to add sizes/shapes of data it represents in the header and keep structure of file same(aux3d.out for instance) dask can be used (introducing a strict dependency) to read the data in a slightly better way but this will will still be ineffective then using netcdf. In fact by changing text format we are trying to do a cheap emulation of netcdf by adding meta data at the top.

Hybrid option as i understand is to use text files for previous versions and use mesh.diag.nc for current versions. In case of textfiles, library can promote faster second time use by keeping a netcdf version in its cache, preferably for each user, say pyfesom2 cache-- I have some working code to this order using zarr but was not stable across versions of underlying, heavily used, fsspec, but this should be straight forward to adapt for netcdf and is expected to be stable. Size of meshdiag must have little influence on its analysis, nevertheless, it can be stripped on first use and also be stored in the above cache.

One more thought:

I would, however, argue against including all the mesh information directly in the model output (this was something I had wished for earlier, but it would explode the file size). Rather, I would suggest that we find some way to publish all of our meshes publicly (if that is feasible), and provide metadata in the output files where the mesh can be accessed (via FTP, Git LFS, DKRZ Swift, whatever). We already have the mesh path in one of the namelists, it should not be too hard to just dump an extra line of metadata into the files when writing output. Then, when a user loads a particular dataset, whatever we have as a load function can check for this, and also load the accompanying information which may be needed for other operations.

I think, in our case we are better off to keep mesh info external to the data files to make those files lean. Idea to publish to a datarepo and include remote path to mesh is cool and i think it enables a few applications. This should be done i think anyway for standard configurations like CMIP etc. But nevertheless, i guess, this can't be assumed for every fesom2 data file because publishing to dkrz or wherever needs an additional post processing step to publish the data, which is easy but needs an additional step.

As for remote data format, netcdf files are not best choice for lazyness because inevitably library behind the scenes will have to download the data and load (zenodo, dkrz etc). If the data server supports range request/streaming access we are better off with a custom binary file or even text file, but this puts reqiurement on hosting server. If there is opendap server backing data then it is good and we can use netcdf, but i dont see one without hosting it ourself. But say if we use dkrz or zenodo, the best strategy is to use zarr, only drawback is we will have to predetermine the optimum chunks that also effects subsequent analysis using the data. This is not necessarily bad, just that we need to be aware that we are suggesting a strategy to use the data. An example using zarr at dkrz is https://nbviewer.jupyter.org/github/FESOM/pyfesom2/blob/master/notebooks/remote_datasets.ipynb, note how quickly library is ready when using large meshes.

suvarchal commented 2 years ago

Here it is for three different fx-variables that are available from CMIP6 for the AWI-CM-1-1-MR simulation piControl:

netcdf deptho_Ofx_AWI-CM-1-1-MR_piControl_r1i1p1f1_gn {
dimensions:
  ncells = 830305 ;
  vertices = 16 ;
variables:
  float deptho(ncells) ;
      deptho:units = "m" ;
      deptho:_FillValue = 1.e+30f ;
      deptho:missing_value = 1.e+30f ;
      deptho:coordinates = "lat lon" ;
      deptho:standard_name = "sea_floor_depth_below_geoid" ;
      deptho:description = "Ocean bathymetry.   Reported here is the sea floor depth for present day relative to z=0 geoid. Reported as missing for land grid cells." ;
      deptho:cell_methods = "area: mean where sea" ;
      deptho:cell_measures = "area: areacello" ;
  double lat(ncells) ;
      lat:units = "degrees_north" ;
      lat:standard_name = "latitude" ;
      lat:bounds = "lat_bnds" ;
  double lat_bnds(ncells, vertices) ;
  double lon(ncells) ;
      lon:units = "degrees_east" ;
      lon:standard_name = "longitude" ;
      lon:bounds = "lon_bnds" ;
  double lon_bnds(ncells, vertices) ;

// global attributes:
      :NCO = "netCDF Operators version 4.7.5 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
      :activity_id = "CMIP" ;
      :Conventions = "CF-1.7 CMIP-6.2" ;
      :creation_date = "2018-12-18T12:00:00Z" ;
      :data_specs_version = "01.00.27" ;
      :experiment = "piControl" ;
      :experiment_id = "piControl" ;
      :forcing_index = 1 ;
      :frequency = "fx" ;
      :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.AWI.AWI-CM-1-1-MR.piControl.none.r1i1p1f1" ;
      :grid = "FESOM 1.4 (unstructured grid in the horizontal with 830305 wet nodes; 46 levels; top grid cell 0-5 m)" ;
      :grid_label = "gn" ;
      :initialization_index = 1 ;
      :institution = "Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany" ;
      :institution_id = "AWI" ;
      :license = "CMIP6 model data produced by Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file). The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
      :mip_era = "CMIP6" ;
      :nominal_resolution = "25 km" ;
      :physics_index = 1 ;
      :product = "model-output" ;
      :realization_index = 1 ;
      :realm = "ocean" ;
      :source = "AWI-CM-1-1-MR" ;
      :source_id = "AWI-CM-1-1-MR" ;
      :source_type = "AOGCM" ;
      :sub_experiment = "none" ;
      :sub_experiment_id = "none" ;
      :table_id = "Ofx" ;
      :tracking_id = "hdl:21.14100/69fa3486-ab20-4aa3-b808-d2687da126ac" ;
      :variable_id = "deptho" ;
      :variant_label = "r1i1p1f1" ;
      :branch_method = "standard" ;
      :branch_time_in_child = 0. ;
      :branch_time_in_parent = 182622. ;
      :parent_activity_id = "CMIP" ;
      :parent_experiment_id = "piControl-spinup" ;
      :parent_mip_era = "CMIP6" ;
      :parent_source_id = "AWI-CM-1-1-MR" ;
      :parent_time_units = "days since 1901-1-1" ;
      :parent_variant_label = "r1i1p1f1" ;
}
netcdf areacello_Ofx_AWI-CM-1-1-MR_piControl_r1i1p1f1_gn {
dimensions:
  ncells = 830305 ;
  vertices = 16 ;
variables:
  double areacello(ncells) ;
      areacello:units = "m2" ;
      areacello:CDI_grid_type = "unstructured" ;
      areacello:_FillValue = 1.00000001504747e+30 ;
      areacello:missing_value = 1.00000001504747e+30 ;
      areacello:coordinates = "lat lon" ;
      areacello:standard_name = "cell_area" ;
      areacello:description = "Horizontal area of ocean grid cells" ;
      areacello:cell_methods = "area: sum" ;
      areacello:cell_measures = "" ;
  double lat(ncells) ;
      lat:units = "degrees_north" ;
      lat:standard_name = "latitude" ;
      lat:bounds = "lat_bnds" ;
  double lat_bnds(ncells, vertices) ;
  double lon(ncells) ;
      lon:units = "degrees_east" ;
      lon:standard_name = "longitude" ;
      lon:bounds = "lon_bnds" ;
  double lon_bnds(ncells, vertices) ;

// global attributes:
      :NCO = "netCDF Operators version 4.9.3 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
      :activity_id = "CMIP" ;
      :Conventions = "CF-1.7 CMIP-6.2" ;
      :creation_date = "2018-12-18T12:00:00Z" ;
      :data_specs_version = "01.00.27" ;
      :experiment = "piControl" ;
      :experiment_id = "piControl" ;
      :forcing_index = 1 ;
      :frequency = "fx" ;
      :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.AWI.AWI-CM-1-1-MR.piControl.none.r1i1p1f1" ;
      :grid = "FESOM 1.4 (unstructured grid in the horizontal with 830305 wet nodes; 46 levels; top grid cell 0-5 m)" ;
      :grid_label = "gn" ;
      :initialization_index = 1 ;
      :institution = "Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany" ;
      :institution_id = "AWI" ;
      :license = "CMIP6 model data produced by Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file). The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
      :mip_era = "CMIP6" ;
      :nominal_resolution = "25 km" ;
      :physics_index = 1 ;
      :product = "model-output" ;
      :realization_index = 1 ;
      :realm = "ocean" ;
      :source = "AWI-CM-1-1-MR" ;
      :source_id = "AWI-CM-1-1-MR" ;
      :source_type = "AOGCM" ;
      :sub_experiment = "none" ;
      :sub_experiment_id = "none" ;
      :table_id = "Ofx" ;
      :tracking_id = "hdl:21.14100/1da0d7a6-1775-4bf4-b7b2-681bc8012ab6" ;
      :variable_id = "areacello" ;
      :variant_label = "r1i1p1f1" ;
      :branch_method = "standard" ;
      :branch_time_in_child = 0. ;
      :branch_time_in_parent = 182622. ;
      :parent_activity_id = "CMIP" ;
      :parent_experiment_id = "piControl-spinup" ;
      :parent_mip_era = "CMIP6" ;
      :parent_source_id = "AWI-CM-1-1-MR" ;
      :parent_time_units = "days since 1901-1-1" ;
      :parent_variant_label = "r1i1p1f1" ;
}
netcdf thkcello_Ofx_AWI-CM-1-1-MR_piControl_r1i1p1f1_gn {
dimensions:
  ncells = 830305 ;
  nlev = 46 ;
  vertices = 16 ;
variables:
  float thkcello(nlev, ncells) ;
      thkcello:units = "m" ;
      thkcello:coordinates = "lat lon" ;
      thkcello:standard_name = "cell_thickness" ;
      thkcello:description = "\'Thickness\' means the vertical extent of a layer. \'Cell\' refers to a model grid-cell." ;
      thkcello:cell_methods = "area: mean" ;
      thkcello:cell_measures = "area: areacello volume: volcello" ;
  double lat(ncells) ;
      lat:units = "degrees_north" ;
      lat:standard_name = "latitude" ;
      lat:bounds = "lat_bnds" ;
  double lat_bnds(ncells, vertices) ;
  double lon(ncells) ;
      lon:units = "degrees_east" ;
      lon:standard_name = "longitude" ;
      lon:bounds = "lon_bnds" ;
  double lon_bnds(ncells, vertices) ;

// global attributes:
      :NCO = "netCDF Operators version 4.9.3 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
      :activity_id = "CMIP" ;
      :Conventions = "CF-1.7 CMIP-6.2" ;
      :creation_date = "2018-12-18T12:00:00Z" ;
      :data_specs_version = "01.00.27" ;
      :experiment = "piControl" ;
      :experiment_id = "piControl" ;
      :forcing_index = 1 ;
      :frequency = "fx" ;
      :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.AWI.AWI-CM-1-1-MR.piControl.none.r1i1p1f1" ;
      :grid = "FESOM 1.4 (unstructured grid in the horizontal with 830305 wet nodes; 46 levels; top grid cell 0-5 m)" ;
      :grid_label = "gn" ;
      :initialization_index = 1 ;
      :institution = "Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany" ;
      :institution_id = "AWI" ;
      :license = "CMIP6 model data produced by Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file). The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
      :mip_era = "CMIP6" ;
      :nominal_resolution = "25 km" ;
      :physics_index = 1 ;
      :product = "model-output" ;
      :realization_index = 1 ;
      :realm = "ocean" ;
      :source = "AWI-CM-1-1-MR" ;
      :source_id = "AWI-CM-1-1-MR" ;
      :source_type = "AOGCM" ;
      :sub_experiment = "none" ;
      :sub_experiment_id = "none" ;
      :table_id = "Ofx" ;
      :tracking_id = "hdl:21.14100/02995ba2-7d93-424b-83aa-005d6dfd6a12" ;
      :variable_id = "thkcello" ;
      :variant_label = "r1i1p1f1" ;
      :branch_method = "standard" ;
      :branch_time_in_child = 0. ;
      :branch_time_in_parent = 182622. ;
      :parent_activity_id = "CMIP" ;
      :parent_experiment_id = "piControl-spinup" ;
      :parent_mip_era = "CMIP6" ;
      :parent_source_id = "AWI-CM-1-1-MR" ;
      :parent_time_units = "days since 1901-1-1" ;
      :parent_variant_label = "r1i1p1f1" ;
}

Just a note: unfortunately this info is not enough for meaningful visualization of the data (and even to regrid), for which we need info on faces/triangles. @helgegoessling's griddes is more complete to me. Another version of it is here: https://swiftbrowser.dkrz.de/public/dkrz_035d8f6ff058403bb42f8302e6badfbc/pyfesom2/cmip6-grids/netcdf/ whose zarr versions are part of remote data paths included in pyfesom2, eg., pyfesom2.datasets.cmip6_hr.load(). Above files are lean versions of what is absolutely necessary but i can imagine many hidden treasures in mesh diag that would make analysis or parts of it on unstructured more efficient, I haven't got to that point yet, if we find those we could also include those, when we take a overhaul of pyfesom2 for speed/efficiency as #153 this is something to lookout for.

Nonetheless, i think it should be addressed in ESGF to include at least faces/triangle information at some point for others outside of AWI to be able to use AWI-CM's CMIP6 data (@tsemmler05 thoughts?).

suvarchal commented 2 years ago

One more plus of the plain text if that one can begin to analyse them before the first run is done and fesom.mesh.diag.nc is generated. But at the end of the day I am all for the netCDF of course.

Still not clear to me if we should use:

pyfesom2.function(data, mesh)

or

pyfesom2.function(data)

With the mesh glued to the data in the second case. If we go for the hybrid model, I think the first variant is still a way to go? But using modified mesh when the region is cutted, as implemented by @suvarchal is also very tempting :)

To me, there may be less conflict in approaches then it seems, and may be it is a bit matter of elegance because in any case mesh and data come from different file sources we can argue for both ways, in case of accessor part you mentioned was just convenience of having mesh and data in one place and to give a cleaner implementation -- after all they belong together. I would still go for keeping both arguments in functions mainly because i don't know the extent to which mesh object, its methods and attributes is used in analysis functions and i don't know if these attributes can become part of xarray dataset or casted as dataarrays and moreover, i think it may be better to keep existing structure in the interest of incremental updates. With #153 i am also curious to find out the other uses of existing mesh object. If we keep both arguments strategy a option to support both use cases could be to add a simple pre-condition to existing functions like:

def fucn(data, mesh=None):
     if mesh is None:
         mesh = data; or a mesh object like view of data if we dont want to rename mesh.x as mesh.lon etc.
     ....

A simple glue code to make a fake mesh like obj from merged xarray dataset could be

class SimpleMesh:
    """Wrapper that fakes pyfesom's mesh object""

    def __init__(self, data):
        self.x2 = data.lon
        self.y2 = data.lat
        self.elem = data.faces

Metadata with locations of meshes is a good idea. I would keep them on awi GitLab, and have link to concrete hash. Not sure if this should be the namelist option or something set automatically inside FESOM2 (I currently lean towards the namelist).

I also think namelist option makes more sense, but we need to ensure mesh was uploaded, and handle credentials on gitlab usedto upload elegantly. And it occurs to me that this can be handled elegantly when esm_tools is used to run fesom2.