Improve unstructured netCDF reader (FVCOM, Telemac, ...)

knutfrode commented 4 years ago

The unstructured netCDF reader is far less developed than the counterpart for regular grids (reader_netCDF_CF_generic). Things to improve:

the present version has no vertical interpolation, and returns only surface values.
present version makes an intermediate horizontal interpolation from unstructured grid onto a regular lon-lat grid. While this is efficient when re-used, it is not an ideal solution, as the unstructured grid normally has strongly varying resolution over the domain. A fast interpolation directly from unstructured grid onto particle positions is probably better. However, this interpolation must then be "hidden" inside the reader, or else Basereader/interpolation/ReaderBlock must be updated to also support unstructured blocks. This might be a relatively complicated task.
the init method contains a lot of duplication from reader_netCDF_CF_generic. Most of this can probably be shared (factored out).

@gauteh might be able to look into this either this spring, or early 2021.

It would then be useful to have available some (small) sample datasets for development.

Perhaps @johannesro and @simonweppe can provide two sample FVCOM datasets for this purpose?
Can someone provide a small sample output file from Telemac-model (As far as I know, this also follows the same CF-convention?)
I believe this is the specifications to follow: http://ugrid-conventions.github.io/ugrid-conventions/

simonweppe commented 4 years ago

Hi Knut-Frode, and all ,

I have actually done a fair bit of work on that aspect, as I needed to use native netcdf file from the SCHISM model, using 3D current data. SCHISM is a hydrodynamic model using unstructured grid (http://ccrm.vims.edu/schismweb/) and I think the native netcdf outputs follow the proposed CF convention for unstructured grids you mention.

I have made a new reader called reader_netCDF_CF_unstructured_SCHISM.py which is on my fork here : https://github.com/simonweppe/opendrift/blob/master/opendrift/readers/reader_netCDF_CF_unstructured_SCHISM.py In this one, I have made a new ReaderBlock class called ReaderBlockUnstruct() https://github.com/simonweppe/opendrift/blob/master/opendrift/readers/reader_netCDF_CF_unstructured_SCHISM.py#L836

The interpolation of variables to particle positions is done using Python'c cKDtree. The 2D or 3D KDtree is built using the unstructured grid nodes "cloud", and will provide the 3 closest nodes to any particles, then apply distance-based weighting to get interpolated valuesat particle positions. This means we nicely take into account the changing resolution throughout the grid in both 2D, and 3D.

An additionnal challenge with SCHISM is that the positions of vertical layers (sigma-layers) changes over times to follow water level elevation. So in my case , when using 3D mode, I need to recompute that tree for each model step. In 2D mode I only compute that tree once, when initializing the reader, so it's more efficient. Still many things I'd like to improve/optimize/test, for example only use a subset of grid nodes to compute the 3D KDtree (i.e. follow particle cloud extents), also look at the time-varying landmask which follow water elevations using the wetting/drying info from model. I'd be happy to provide some netcdf sample file and test script and contribute, let me know what you think.

Simon

knutfrode commented 4 years ago

This is excellent, I was not aware that you had done this much work on that reader. Your solution with ReaderBlockUnstruct is exactly what I thought would be the best solution. Then it would be nice to merge your reader into the master version, so that other people also can use and contribute to this reader.

But then several specific questions/challenges could be discussed:

It would in principle be best to make a generic unstructured reader to support output from "all" such ocean models, instead of having a reader for each with a lot of duplication (reader_FVCOM, reader_SCHISM, reader_Telemac, ...). This would require some checking in the __ init__method, but this is hopefully feasible? Some dictionaries with model/institution-specific things could be stored within the reader.
Could the ReaderBlockUnstruc be moved to the interpolation module together with the regular ReaderBlock class, or should it rather stay within the unstructured reader?
and would this work with the OpenDrift master version, or does it require specific adaptations within the core (basemodel, basereader)?

I have created a thredds-folder where we can store various sample files for testing: https://thredds.met.no/thredds/catalog/metusers/knutfd/thredds/netcdf_unstructured_samples/catalog.html

So if you could make available some SCHISM og FVCOM sample files, I could also upload them there.

simonweppe commented 4 years ago

Hi there, Happy to merge of course. I'll tidy up the code, move ReaderBlockUnstruct to the interpolation module, then do a pull request. This was developed totally independently from the rest of the code, so no specific adaptions to any of the core needed. The reader is treated just as another type of reader and returns the expected interpolated variables at particle positions.

I'll share a SCHISM output file and test script later this week so you can have a look.

I agree with your first bullet point - would be good to have a generic one for all unstructured grids, but haven't played with other formats so far. I don't expect it would be too much work..mostly mapping model-specific names to conventions used in Opendrift.

knutfrode commented 4 years ago

Five sample files are now uploaded to a thredds server: https://thredds.met.no/thredds/catalog/metusers/knutfd/thredds/netcdf_unstructured_samples/catalog.html

All of these should follow netCDF CF convention and the U-grid convention Nevertheless, there are differences seen already in the dimension attributes (below). The aim should still be that the unstructured reader should be able to digest all of these.

netcdf cal03brcl_21_0003_EDITED_SHORTENED1 {
dimensions:
    node = 25520 ;
    nele = 45792 ;
    siglay = 20 ;
    time = UNLIMITED ; // (131 currently)

netcdf NorL3_0001 {
dimensions:
    nele = 1932935 ;
    node = 1001245 ;
    siglay = 34 ;
    siglev = 35 ;
    three = 3 ;
    time = UNLIMITED ; // (24 currently)
    DateStrLen = 26 ;
    maxnode = 11 ;

netcdf NorLand_grid {
dimensions:
    nElems = 1932935 ;
    sigma_layers = 34 ;
    nVerts = 1001245 ;

netcdf schism_marl20080101_00z_3D {
dimensions:
    nSCHISM_hgrid_node = 149233 ;
    nSCHISM_hgrid_face = 274359 ;
    nSCHISM_hgrid_edge = 423678 ;
    nMaxSCHISM_hgrid_face_nodes = 4 ;
    nSCHISM_vgrid_layers = 38 ;
    one = 1 ;
    two = 2 ;
    sigma = 38 ;

netcdf AkvaplanNiva_sample {
dimensions:
    nele = 293690 ;
    node = 152842 ;
    siglay = 34 ;
    siglev = 35 ;
    three = 3 ;
    time = UNLIMITED ; // (24 currently)
    DateStrLen = 26 ;
    maxnode = 11 ;
    maxelem = 9 ;
    four = 4 ;

knutfrode commented 3 years ago

Hi @simonweppe ! @gauteh and myself are planning to improve the netCDF_unstructured reader the coming few weeks. Our urgent need is to support FVCOM, but we hope to make it generic enough to also support SCHISM, TELEMAC and others (i.e. all 5 sample files above).

Are you generally happy with the way you did it in your SCHISM reader (including defining ReaderBlockUnstruct)? Is the interpolation with KDTree fast enough?

If so, we might "borrow" parts of your code into the master branch reader_netCDF_unstructured, of course with acknowledgement to yourself. This might be simpler than a pull request from your branch, as there is presently some divergence.

Alternatives we are looking at include PyFVCOM, which is using griddata for interpolation. That sounds not too fast to me, but I see there are interesting attempts to parallellize it with dask: https://gist.github.com/wgurecky/c117144a9a496754020aa2ce31159da0

simonweppe commented 3 years ago

Hi @knutfrode,

Yes of course, feel free to borrow anything you like !

I have made a fair bit of testing on it recently (though focusing on native SCHISM files) so this is functional, and generally fast enough ..at least for my applications.

It would be great if you manage to make it generic for all 5 different formats. If we do need to keep some different readers for different models, I'd be happy to make a pull request for the native schism reader I've developed. (for example from a clean fork of your official code).

WIll keep on eye the devs! Feel free to reach out if needed.

knutfrode commented 3 years ago

Good, we will dig into it.

Btw, the FVCOM files seem to also have sigma-coordinates, as an additional complexity. It seems that your SCHISM files are in z-coordinates? Have you looked at sigma-to-z interpolation for this reader?

simonweppe commented 3 years ago

@knutfrode - I believe schism files can have sigma-levels, z-levels or mix of sigma and z-levels. This means the depths of vertical level are often time-varying (they are saved in a variable zcor).

The advantage of the KDtree approach is that it is fairly independent to vertical scheme, since you can generate them from any cloud of points. For the schism reader, Im just extracting the x,y,z coordinates of the model nodes at a given time step, then build the 3D KDtree from that set of nodes (i.e. no interpolation to z-levels). This way we nicely retain the vertical resolution of the model (e.g. near bed or near surface) The downside is that I do need to rebuild a 3D KDtree for each new reader "block", but it seems to be ok speed-wise so far.

Hopefully this is clear :)

knutfrode commented 3 years ago

Yes, this is clear, and good news :-)

Boorhin commented 3 years ago

I have made a bit of progress on making netcdf from Telemac 3D However there are some incompatibilities with reader_netCDF_CF_unstructured. Telemac 3D is a FE model and all the data are attached to the nodes. So when declaring the node_variables list (l 61) there should be also sea_water velocity in it as in the face_variables If that is too complicated maybe it is worth having a specific Telemac reader? The same when there is a test of populated coordinates for the centre of cells, Telemac doesn't use them so it will fail. Does opendrift need them? If yes, it seems better to have a filter special for FE models for that. Just as a note, the CF convention since 1.5? identify the axis with a keyword. So instead of looking for the name of the variable "x" which could change between software, you look at what defines the axis X, Y (and Z). Should I get going and try to make one of these filter specifically? I am not very familiar with OpenDrift but I can give it a go.

gauteh commented 3 years ago

That would be great! We should be able to help out with some hints and assistance under-way. If there are stuff that can be generalized between FVCOM and Telemac see if that can be gathered in the UnstructuredReader class.

Boorhin commented 3 years ago

@gauteh I would like to know what you need to get into the reader. I was musing a bit in the code recently and I wondered what exactly was needed. For example:

I have z for each time and node, that's good?
Do you need the coordinates of cell centres, face centre? I have not read all the theory behind your code to be honest.
I also have temperature and salinity, I guess you want it?
Any interest in sigma layers? seems like it is more an issue for the code?

Here the output of the netcdf I made from a Selafin according to CF and Ugrid. It could be interesting to know before hand if you want more things into it, like z of all nodes. I would suggest that in the attribute you take something like the EPSG of the grid or use the crs as suggested in CF 1.8 to reproject a grid if needed instead of the proj4 argument

import netCDF4 as nC
d = nc.Dataset('somefile.nc')
In [23]: d
Out[23]:
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    Conventions: CF-1.8
    title: TELEMAC 3D: Cape Wrath                                                  SERAFIN
    history: 2021-03-24 16:02
    institution: Plastic@bay - MTS-CFD
    source: Telemac 3D v8p2
    CoordinateSystem: Cartesian
    CoordinateProjection: UTM30
    dimensions(sizes): nMesh2_node(19555), nMesh2_face(36378), nMaxMesh2_face_nodes(3), Mesh2_layers(10), time(137), Two(2)
    variables(dimensions): float64 time(time), int64 Mesh2(), int64 Mesh2_face_nodes(nMesh2_face,nMaxMesh2_face_nodes), int64 crs(), float64 Mesh2_node_x(nMesh2_node), float64 Mesh2_node_y(nMesh2_node), float64 Mesh2_layers(Mesh2_layers), float64 Mesh2_depth(nMesh2_node), float64 Mesh2_surface(nMesh2_node), float32 u(time,Mesh2_layers,nMesh2_node), float32 v(time,Mesh2_layers,nMesh2_node), float32 w(time,Mesh2_layers,nMesh2_node)
    groups:

gauteh commented 3 years ago

You need a way to convert sigma to z. It seems like the variables are defined on the sigma layer, but there probably is information for converting sigma to relative depth (that is what we do for FVCOM anyway). For FVCOM the different variables are sometimes defined at nodes and sometimes at face (cell center), there are coordinates for both. Since we currently only do nearest neighbour interpolation this is done by looking up the in a kd-tree depending on face or node-variable, and then returning that variable. We use cKDTree, but this should in fact use pykdtree if it is installed (like it is done in regrid in cartopy) -- this will be faster (threading controlled with OMP_NUM_THREADS env variable) -- the pykdtree optimization can be done later.

So you need to:

determine if a variable is a face variable or node variable
initialize kd-tree for face coordinates and node coordinates for later lookup when variable is read
check if sigma-layer to z conversion works with the same code as FVCOM

For the future: FVCOM provides lookup tables for nearby nodes and elements (faces), this would allow efficiently interpolating between node or face. Backup would be to do kdtree lookup but return several results (this is probably also efficient enough).

temperature and salinity can probably be used for some of the more specialized models in opendrift (oildrift, etc), but the most fundamental are seawater_velocity and wind.

This might not fit very well into the FVCOM-reader structure since it might contain some implicit design choices that favour FVCOM. So it is fine to make the FVCOM-reader slightly more complicated so that a general unstructured-reader can be used as a base for both FVCOM and Telemac.

Boorhin commented 3 years ago

Ok, In fact it is easy because you have z for each node and time. Sigma is just a convenience I added to look like FVCOM, it is not used in Selafin files. The structure is rebuilt from knowing the attributes. I will get my head around it. I think I can trim a lot of the FVCOM thing.

gauteh commented 3 years ago

Great, yeah, then it should be more straight-forward.

gauteh commented 3 years ago

By the way, we would prefer a reader that works with un-processed Telemac files if possible. But I have not worked with telemac-files so I do not not know if that is a practical solution. I assume they are netcdf / hdf5?

Boorhin commented 3 years ago

That was my intention. The files are Selafin it is specific to Telemac, you would actually need to have a working Telemac installation. The files are pretty complex and I have abandoned the idea of making a standalone reader as they don't have any documentation for their python scripts. They have a SVN repository which may allow you to keep track of the reader but that's beyond my knowledge. Is there any reason why you don't use VTK libraries in Opendrift? They have a lot of great methods for unstructured grids. In particular if you are looking for face identification to interpolate data. That's what I use at the moment until I find a way to get Opendrift to eat the Selafin files... Vtk also allows you to use pyvista which is really a good lib for 3D rendering. I used that for FVCOM data.

On Wed, 7 Apr 2021, 08:28 Gaute Hope, @.***> wrote:

By the way, we would prefer a reader that works with un-processed Telemac files if possible. But I have not worked with telemac-files so I do not not know if that is a practical solution. I assume they are netcdf / hdf5?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OpenDrift/opendrift/issues/243#issuecomment-814674896, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJCEFSVAQIFBBA3JPH3URTTHQCSNANCNFSM4KUVQFMA .

gauteh commented 3 years ago

Not except trying to keep dependencies at a minimum, but we could consider making VTK an optional dependency required for Telemac files. I see that GDAL supports selafin, maybe it would be possible to use it through GDAL or Rasterio/Fiona.

Boorhin commented 3 years ago

I never managed to make gdal work with these files. It will just rasterise them and you will loose all the information you need for particle tracking. Regarding VTK, it is the oldest and probably the most advanced 3D mesh library in existence. I completely understand the dependency issue. However, you don't take any risk integrating it as it is more than 20 years old and is at the back of many scientific software. The other thing is that opendrift is about particle tracking and you will have to reinvent the wheel. There are countless people that have optimized VTK. I would be you I would have a look. I think that your optimisation issues would be sorted into a one liner. I do similar queries in my code and I use VTK for that. It is incredibly fast. For me not using it is a bit like rewriting a part of numpy but maybe I am too lazy.

On Wed, 7 Apr 2021, 08:45 Gaute Hope, @.***> wrote:

Not except trying to keep dependencies at a minimum, but we could consider making VTK an optional dependency required for Telemac files. I see that GDAL supports selafin, maybe it would be possible to use it through GDAL or Rasterio/Fiona.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OpenDrift/opendrift/issues/243#issuecomment-814685486, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJCEFRUMHZGE6AHUPMZYT3THQERXANCNFSM4KUVQFMA .

gauteh commented 3 years ago

Then it definitively sounds like we should take a serious look. I think it will still stay an optional dependency, but that should be fine. It seems vtk is readily packaged by conda-forge at least.

knutfrode commented 3 years ago

We are lazy too, so we will check out VTK. Thanks for the tip.

Boorhin commented 3 years ago

FE unstructured grid reader

Here is my take on reading the parameters of a CF 1.8 compliant netcdf generated from a Telemac file https://github.com/Boorhin/opendrift/blob/jul_FE_netcdf/opendrift/readers/reader_netCDF_CF_unstructured_FE.py There are a lot of functions I didn't use and I kind of cleared a lot of the code that, to me, was messy or not useful. I also added a function to make a proj4 string from a CF compliant netcdf mkproj4() I think that was lacking. I will probably make a suggestion on how to manage your projections in the code as I think it needs to be a bit more user friendly. proj4 under the hood is fine but it has been widely replaced by EPSG code identification in pyproj.

function get_variable

I have not been able to understand your function get_variables (lack ofdocumentation). It is not really making sense to me so I have not modified it much. Is that a method that a particle seeks to get the nearest variable? Would you be able to clarify and document it? If you use the netcdf reader why don't you use it to make your queries instead of duplicating the variables?

selafin files entry

I will also make you something similar directly from a Selafin file. But once again the Selafin file reader comes with methods to search it, there is no real need of putting the full mesh in memory. That's the whole point of these formats I think.

strategy for Netcdf reader

In your code, you are trying to patch errors from outputs of netcdf of other software. I think you will endlessly be in trouble with this and the best is to use the available tools of the CF convention to check the files before processing them. https://cfconventions.org/compliance-checker.html if the files don't pass the test, you simply reject them and document the missing parameters. Netcdf files are easy to modify, the user can do it or report errors to developer of the faulty export. Here is a stand-alone script you can use (recommanded by CF) https://github.com/cedadev/cf-checker/blob/master/src/cfchecker/cfchecks.py

gauteh commented 3 years ago

FE unstructured grid reader

Here is my take on reading the parameters of a CF 1.8 compliant netcdf generated from a Telemac file https://github.com/Boorhin/opendrift/blob/jul_FE_netcdf/opendrift/readers/reader_netCDF_CF_unstructured_FE.py There are a lot of functions I didn't use and I kind of cleared a lot of the code that, to me, was messy or not useful. I also added a function to make a proj4 string from a CF compliant netcdf mkproj4() I think that was lacking. I will probably make a suggestion on how to manage your projections in the code as I think it needs to be a bit more user friendly. proj4 under the hood is fine but it has been widely replaced by EPSG code identification in pyproj.

Good, there is actually some code (in reader netcdf generic I think) that does something similar, but it might be slightly different from your solution. We periodically check if there is some kind of library that does this, but we haven't found one yet. Maybe we should separate it out into a separate package.

function get_variable

I have not been able to understand your function get_variables (lack ofdocumentation). It is not really making sense to me so I have not modified it much. Is that a method that a particle seeks to get the nearest variable? Would you be able to clarify and document it? If you use the netcdf reader why don't you use it to make your queries instead of duplicating the variables?

I tried to describe the design here: https://opendrift.github.io/autoapi/opendrift/readers/index.html#module-opendrift.readers, we have recently tried to make this more general. But supporting structured, continuous, and unstructured readers makes this a bit complicated. The main interface for the simulation to the reader is get_variables_interpolated, and each reader must implement _get_variables_interpolated_. However, the StructuredReader and UnstructuredReader implement this one, and in turn expects its subclass to implement get_variables (which is abstract in these classes). If you subclassed Variables directly, you would implement _get_variables_interpolated_.

selafin files entry

I will also make you something similar directly from a Selafin file. But once again the Selafin file reader comes with methods to search it, there is no real need of putting the full mesh in memory. That's the whole point of these formats I think.

strategy for Netcdf reader

In your code, you are trying to patch errors from outputs of netcdf of other software. I think you will endlessly be in trouble with this and the best is to use the available tools of the CF convention to check the files before processing them. https://cfconventions.org/compliance-checker.html if the files don't pass the test, you simply reject them and document the missing parameters. Netcdf files are easy to modify, the user can do it or report errors to developer of the faulty export. Here is a stand-alone script you can use (recommanded by CF) https://github.com/cedadev/cf-checker/blob/master/src/cfchecker/cfchecks.py

It would be great if we could do that, but we have to find a pragmatic solution. We are also users of opendrift, and it would be a huge nuisance if we were not somewhat flexible in what we accepts as input. Maybe some more things can be moved out, and the user can fix it with variable mappings. Or maybe xarray can do some of the coordinate and dimension logic that we currently do. Anyway, you do not need to make a very flexible reader now -- these fixes have accumulated over the years.

OpenDrift / opendrift