Closed rsignell-usgs closed 8 years ago
Now that NCWMS2 has UGRID support, would ncWMS2 developers be interested in adding SGRID support?
That would be awesome! I don't know anything about Java, but whomever tries to implements this please feel free to bug me with questions about SGRID.
Adding this feature would be great! +1
This would give us the ability to display POM/ROMS/ECOM/WRF/Delft3D/Native HYcom
It would be useful for the IOOS RAs and for NOS modeling efforts at CO-OPS and Coast Survey Development Lab.
@guygriffiths could you have a look at this and estimate how big a job it is? Sounds like it would be very useful.
@jonblower , glad to see the interest from you!
Most of the models we've come across use Arakawa-C grids (the Arakawa-B grid POLCOMS is an exception), so if we want to access the native grid data in a standard way, we need these SGRID conventions.
Many folks average their velocity data to grid cell centers for distribution because they know that tools generally choke on the native grid staggered data, but we would like to be able to have this averaging happen at the end of the processing workflow rather than the beginning, so that other applications (e.g. oil spill and trajectory modeling) may take advantage of the higher resolution velocity data information contained in the native grid.
Make sense?
@jonblower sure, it's on my list of things to do.
@rsignell-usgs, @ocefpaf - Just to make sure I'm getting the right end of the stick, is this right?
@guygriffiths, yes, here's an example of how these conventions make things more efficient.
Use case: user wants to subset some data in a bounding box and plot some vectors. Brute force way: determine the u points and associated coordinate values in the bounding box, do same for v points, then do a general interpolation of both u and v onto a 3rd set of points, the grid cell centers. SGRID way: use the padding attributes to determine how the u and v points are "shifted" relative to the center points, and do a simple average on u and v.
Does that make sense?
That makes sense, although I'm reasonably sure that that won't impact on how we calculate vectors in ncWMS - we're always regridding both vector components onto a 3rd grid (i.e. the pixels in the WMS view) and we're using a nearest neighbour method rather than interpolation.
A quick final point: All of the examples in the SGRID conventions document have 2D coordinate variables (e.g. rotated pole or similar). Is this just the examples which have been chosen? i.e. Can the SGRID conventions ever apply to a rectilinear grid based on 2x1D coordinate axes? I see no reason why they shouldn't, but I thought I'd check...
Yes. Although our examples are with 2D lon,lat coordinates, it applies when there are 1D lon,lat coordinate variables as well.
I'll have a look into implementing these. If you have a link to some example files it would help a lot.
Also, when the padding attribute is not "none", is the size of the padded cell considered to be the same as it's neighbouring cell? That's what it looks like and I can't think of another sensible way of doing it, but I couldn't see that it was specified in the conventions.
@guygriffiths , here's an example SGRID dataset: http://geoport-dev.whoi.edu/thredds/dodsC/coawst_4/use/fmrc/coawst_4_use_best.ncd.html
I've got a few more questions about the finer details of the SGRID conventions - is it best to have the discussion on here, or is there someone I should email with questions?
@guygriffiths I am OK with the discussion happening here or in https://github.com/sgrid/sgrid/issues. I'd rather not use e-mails so we can use GitHub as reference in the future.
@ocefpaf OK, I'll pose my questions here as and when I come across them, but some of them might be better suited to the SGRID issues site. Currently I have:
Indeed, it is only required if there is a vertical dimension present. The draft document at http://bit.ly/sgrid_cf had the vertical_dimension listed as optional, so it is probably a transcription error going to GitHub.
Oops, sorry. You're completely correct. The 2D Grid and 3D Grid examples should be corrected. The others are correct I believe.
Correct. As in the first case this is a transcription error.
For 2D grids the cf_role, topology_dimension, node_dimensions, and face_dimensions are required.
For 3D grids the cf_role, topology_dimension, node_dimensions, and volume_dimensions are required.
The specification of xx_coordinates requires the definition of the corresponding xx_attributes. The exception to this rule may be that edge1/2_dimensions may not be required if they use dimension names consistent with the faces and nodes.
Yes.
Hm, good point. I think that you're correct and that the location of the variable could be derived from the dimension used. However, the location attribute is probably an easier route and the reading software can warn in case of inconsistencies. At least this is what I do myself. The same would hold by the way for the location attribute of UGRID.
Cheers,
Bert
On Thu, Mar 10, 2016 at 1:12 PM, Guy Griffiths notifications@github.com wrote:
@ocefpaf https://github.com/ocefpaf OK, I'll pose my questions here as and when I come across them, but some of them might be better suited to the SGRID issues site. Currently I have:
- For a 2D grid, the vertical_dimensions attribute is listed as required, but it is not present in any of the examples. Does that mean it's only a required attribute when there's a vertical dimension present? Or is it an error in the examples?
- A note: many of the examples are not valid CDL - anything with "MyGrid" has the wrong attributes (i.e. the attributes belong to a non-existent variable named "grid")
- I think that the cf_role attribute should be mandatory, to more clearly distinguish the conventions from UGRID. cf_role seems the most sensible attribute to check to determine that SGRID is present
- Of the optional attributes, it would be useful to know which are co-dependent. For example, if we have face_dimensions, I think that node_dimensions must also be present. These relationships aren't clearly defined at the moment.
- Is the definitive attribute to determine that a variable is on a staggered grid the "grid" attribute? For example, for a given variable I should look for a "grid" attribute, and follow that to determine how the dimensions of that variable can be determined from other coordinate variables.
- If that's the case, is the "location" attribute on a variable redundant in determining the staggered grid? It seems that the same information can be extracted from the the dimensions the variable uses and how they map to other coordinate variables. But perhaps I'm missing something?
— Reply to this email directly or view it on GitHub https://github.com/Reading-eScience-Centre/edal-java/issues/46#issuecomment-194815686 .
@guygriffiths , thanks so much for digging into this a bit and locating these errors/inconsistencies. So far the only code that has used this is pysgrid, and I guess these issues didn't come up there.
Is anyone already working on a PR to sgrid, or shall I attempt?
Is anyone already working on a PR to sgrid, or shall I attempt?
Please do :grinning:
Does this mean that node_coordinates is optional? If it's not present, do we have to determine the node co-ordinates from the co-ordinate variables for the node_dimensions? Presumably that would only work in the case where the co-ordinates were 1-dimensional independent axes?
I guess the thinking was that each data variable has coordinates and the location identified?
With ROMS results we write the face_coordinates
, node_coordinates
, edge1_coordinates
and edge2_coordinates
.
@hrajagers, what was your thinking on this?
Some models like ROMS indeed write out coordinates for all stagger locations. I was on the hand thinking also of finite volume people who explicitly don't want to write out coordinates for faces and edges because the face and edge quantities represent face/edge average values, not point values. So, face and edge coordinates should be optional. Then again there are other models such as WRF (and one our own Boussinesq models) that write out coordinates only for the "cell centers" and I believe that this usually means that the "edge" points are located halfway between them -- surely this would be the most logical assumption for me if I get a not fully specified data set. If we require node coordinates we would not be able to support current WRF files using ncML and I think that would be a shame.
Cheers,
Bert Op 10 mrt. 2016 16:24 schreef "Rich Signell" notifications@github.com:
I guess the thinking was that each data variable has coordinates and the location identified?
With ROMS results we write the face_coordinates, node_coordinates, edge1_coordinates and edge2_coordinates.
— Reply to this email directly or view it on GitHub https://github.com/Reading-eScience-Centre/edal-java/issues/46#issuecomment-194902873 .
Okay, the inconsistencies have been fixed in this SGRID PR https://github.com/sgrid/sgrid/pull/11.
See http://sgrid.github.io/sgrid/
@guygriffiths , let us know if you find other issues and we'll try to resolve them ASAP.
@guygriffiths , just curious. Did you hit a roadblock here, did this effort become out of scope for now, or are you still plugging away as time permits?
@rsignell-usgs - I've been fairly busy recently and this has had to go on the backburner. I have some skeleton code and notes, and I should be free to work on it a bit more in a couple of weeks time.
@guygriffiths , okay, glad to here it!
It's so great what you've done with the UGRID capability. I'm at a meeting here in Washington DC where this experimental storm surge forecasting was announced: http://www.opc.ncep.noaa.gov/estofs/estofs_surge_info.shtml http://www.opc.ncep.noaa.gov/estofs/estofs_pacific_surge_info.shtml Both of these are UGRID, so your ncWMS2/Godiva3 improvements are very timely!
@guygriffiths , I hope SGRID support is still on the radar. This would be hugely useful to the ROMS, Delft3D, WRF and other communities as it would allow them to visualize velocity data on their native grids!
I'm working on this at the moment, but it's considerably more complex to implement than the UGRID case. I hope to have something working by the end of the week, but it'll most likely be an implementation of the 2D horizontal case rather than the full 3D one.
@guygriffiths , that's fine, actually. We don't even use the full 3D one ourselves. We just have 2D horizontal case with multiple vertical layers, and where the vertical coordinate is fully specified.
A quick question about the example grid which @rsignell-usgs linked to (http://geoport-dev.whoi.edu/thredds/dodsC/coawst_4/use/fmrc/coawst_4_use_best.ncd.html):
Whilst this is a staggered grid, it appears that all of the staggered grids are explicitly defined.
For example, the SGRID variable has the "edge1_dimensions" attribute to indicate how variables on the edges of the grid should be defined. It also specifies "edge1_coordinates" which points to a pair of coordinate variables which explicitly define that grid.
I don't really understand what's going on here. What is the aim of explicitly defining a grid in addition to defining how it is staggered relative to another grid? Is this just a workaround for the fact that SGRID is not widely supported? Or is it to define the relationship between the grids?
My impression is that the point (or at least one of the main points) of SGRID is to allow users to define just a single grid in the NetCDF, and then derive staggered grids from it. Explicitly defining the staggered grids seems to defeat the purpose of SGRID.
How should I treat this? I can either use the explicitly defined grid and save the padding information, or I can just derive the staggered grid from the node-based grid and the paddings.
The example grid (http://geoport-dev.whoi.edu/thredds/dodsC/coawst_4/use/fmrc/coawst_4_use_best.ncd.html) hopefully follows the example for ROMS grids as described in the SGRID conventions (https://sgrid.github.io/sgrid/#roms)
I don't really understand what's going on here. What is the aim of explicitly defining a grid in addition to defining how it is staggered relative to another grid? Is this just a workaround for the fact that SGRID is not widely supported? Or is it to define the relationship between the grids?
ROMS can write output at 4 different locations:
In ROMS these coordinates are all provided, and you could just use regular interpolation methods to do such tasks as averaging u and v to grid cell centers (the "lon_rho, lat_rho" points).
But it's much more efficient to simply average the 2 u points (on the sides of the cell), and the 2 v points (on the top and bottom of the cell) to get the velocity at the grid cell center. The SGRID conventions are really just to disambiguate whether you should do this:
uc (j,i) = 0.5*(u(j,i-1) + u(j,i)))
or this
uc (j,i) = 0.5*(u(j,i) + u(j,i+1))
Does that make sense?
It kind of makes sense, but what I don't understand is why lat_rho, lon_rho, lat_u, lon_u, lat_v, and lon_v are defined at all. Aren't they redundant, given that they can all be derived from lat_psi, lon_psi, and the SGRID relationships?
And if they are redundant, which is the definitive grid to use for u, v, zeta, etc? The NetCDF-defined ones, or the SGRID defined ones?
They are not redundant since these are curvilinear orthogonal grids, like this:
So if given the corners or center of edges you can guess what the exact coordinate values are in the grid cell centers, but it won't be exact.
For ROMS and other C grid models, the values will always be defined at the grid cell centers. I guess the most robust thing to do would be to check if the coords for edges and corners (nodes) exist, and if not, estimate them.
OK, that's all clear. I've implemented it, and I'll let you have a test version once I've managed to iron out the race-condition bug which I've managed to introduce for staggered grids...
OK, I've uploaded a working version here: http://www.personal.rdg.ac.uk/~qx901922/ncWMS2.war
I've tested it with the dataset you supplied (which would have worked for visualisation purposes anyway, since all of the grids are defined) as well as some test grids based on the definitions in the SGRID spec. If anyone has some sample datasets to use, I'd be grateful if you could try this build. If no-one finds any 2D SGRIDS which don't work with it, I'll do a release in a couple of days time.
@guygriffiths , I tried out the new WAR file, and it works fine to display each scalar variable component on the COAWST model test dataset, but I didn't see a "u:v-group" or "ubar:vbar-group" like I expected.
One of the main gaps in CF we wanted to fill with SGRID was the ability to plot vectors correctly from a staggered grid model.
Is this a future enhancement for ncWMS2, or am I doing something wrong on my side?
At the moment it's just the reading of data onto staggered grids that's implemented. I'll have a go at getting vector quantities to work properly. However, it's not immediately obvious to me how that should work in the general case. When the padding is set to "both" or there is no offset, it's straightforward, but for "none", "low", or "high", it's not apparent how this would work.
For now I will implement it only for the simple case so it can be used with the COAWST models. If you have any ideas of how the other cases should be handled, just let me know and I'll do them too.
The way I see it, different padding types just modify the output domain of the operation requested. For example if cell-centered velocities are requested, and there are not both u and v components on sides and top/bottom of the cell, you will not get a cell-centered velocity there. In other words, no special cases that involve using nearest values or extrapolation.
But handling COAWST type data first would be a great step. We have a lot of that type of model output.
One thing we also have to deal with COAWST/ROMS output is that the velocities are aligned with the grid orientation, not east/north. So we've listed the standard_name
for u,v
as x_sea_water_velocity, y_sea_water_velocity
(and maybe we should propose standard_names like i_sea_water_velocity
and j_sea_water_velocity
, because they are actually aligned with the logically rectangular i,j
grid, not some projected 'x,y' coordinates).
So to rotate these to east/north coordinates, we need an angle
array for the grid cell centers that describes their orientation. Actually, COAWST/ROMS already has this variable (called angle
) but to do this in a general way requires calculating this angle
array from the grid coordinates.
Does this make sense?
OK, I'll implement the COAWST type first, then see how long the general case will take.
Currently vectors are recognised based on the standard names "northward.", "eastward.", "u-._component" and "v-._component" (I know that these last two aren't officially standard names, but they were used in many of the examples I was testing). I'll add the ".x._velocity" and ".y._velocity" as triggers for vectors.
The fact that the vectors are relative to the grid orientation shouldn't be a problem. Vectors already work on rotated grids (we either use a mathematical transformation if it's present, otherwise we calculate the angles based on the direction of the neighbouring cells - not 100% accurate but usually good enough for visualisation purposes)
The fact that the vectors are relative to the grid orientation shouldn't be a problem.
Phew!
We calculate the angles based on the direction of the neighbouring cells - not 100% accurate but usually good enough for visualisation purposes
Agreed!
Just to let you know that our current handling of vectors assumes that we only have a single value for each component (rather than the averaging of multiple-components which we need to do here), so it's going to be a considerably bigger job that I first envisaged. It's still something I'm working on though.
OK, so here's the current situation.
In the left-hand diagram below, the purple square represents a single grid cell of the unstaggered grid. It has a value defined at the centre, which in EDAL is valid throughout the grid cell.
The 2 red squares represent the grid cells of the u-component of (say) velocity. They have the values 1 and 2 defined at their centres, which in EDAL are valid throughout their grid cells.
The 2 black squares represent the grid cells of the v-component of velocity. They have the values 3 and 4 defined at their centres, which in EDAL are valid throughout their grid cells.
Currently, if I generate velocity vectors from these components, I get 4 separate vectors within the original unstaggered grid cell: (1,3), (2,3), (1,4), (2,4). Consequently, when visualising the data, we get the velocities rendered at 2x the resolution of the original unstaggered grid.
As I understand it, you would like visualisation to follow the situation in the right-hand diagram - one single vector whose components are averaged from the surrounding u- and v-components.
So, question 1: Am I correct that the second situation is what you want? Or would the first one (at 2x the resolution) be acceptable/preferable for visualisation?
Assuming that you wish the second situation to be the case, the only practical way I can find to implement it is to dynamically generate 2 new variables, u-interpolated
and v-interpolated
. These would (as their names suggest) be variables which interpolate the u/v components onto the original, non-staggered grid. These could then be automatically collected into vector groups.
However, the original u
and v
variables will still be present. The simplest situation would be that the resulting set of available variables would be:
u
v
vector-group
u-interpolated
v-interpolated
magnitude
direction
However, we could also make the 2x resolution version available:
vector-group 1
u
v
magnitude
direction
vector-group 2
u-interpolated
v-interpolated
magnitude
direction
But since there's nothing which particular distinguishes the staggered grid variables from the interpolated grid variables, the full solution would be:
vector-group 1 (2x resolution of unstaggered grid)
u
v
magnitude
direction
vector-group 2 (2x x-resolution of unstaggered grid, 1x y-resolution of unstaggered grid)
u
v-interpolated
magnitude
direction
vector-group 3 (1x x-resolution of unstaggered grid, 2x y-resolution of unstaggered grid)
u-interpolated
v
magnitude
direction
vector-group 4 (resolution of unstaggered grid)
u-interpolated
v-interpolated
magnitude
direction
My feeling is that the 1st situation, whereby the staggered u and v components are available to see individually, but the interpolated u and v components are collected into a vector group is probably what you'd want.
So finally, question 2: Which of these options would you prefer? Additionally, do you think that the suffix "interpolated" is the most logical? It could instead be "unstaggered", or of course it's also possible to add the suffix "staggered" to the original staggered variables. Whatever combination suits you.
@guygriffiths, let's just average the velocity component to the face
location (no 2X resolution). And let's use the sgrid
terminology face
for a suffix, like:
u
v
vector-group
u-face
v-face
magnitude
direction
Sorry about the delay -- I was on vacation last week and then putting out fires...
OK, I'd anticipated this and had already done the vast majority of the coding, so it's now completed on the develop branch. I've also updated the build at: http://www.personal.rdg.ac.uk/~qx901922/ncWMS2.war for you to test.
The one difference was that I used the suffix ":face", rather than "-face". This is because the character ":", is not permitted in NetCDF variable names, whereas "-" is. Both are forbidden by the CF conventions, but using the colon seemed slightly more robust - it guarantees that there are no pre-existing variables whose names end in "-face".
@guygriffiths , it's working! Check out our COAWST forecast (OPeNDAP endpoint http://geoport-dev.whoi.edu/thredds/dodsC/coawst_4/use/fmrc/coawst_4_use_best.ncd):
Woo hoo! It did take 80 seconds, however, to render this image. Is there room for performance enhancement?
My guess is that the performance issue is primarily due to network speed. I'm not sure, but I have a local copy of that dataset (at a different time) and it takes about 10s to render from my laptop. Certainly not as quick as simple gridded data, but it does need to calculate the averages on the fly, then transform them into vectors (which involves calculating the grid direction).
The OPeNDAP endpoint is on the same machine that ncWMS2 is running on, so perhaps it's the OPeNDAP access efficiency vs local file that is making the difference.
So can we point ncWMS directly at an NcML file? I guess so, since its using netcdf-java, right?
Yep, you can add an NcML file to ncWMS. Or if you give it a glob expression, it'll internally use the netcdf-java libs to process it as if it were NcML.
Wow, the NcML is a lot faster. Getting some strange caching issues, though. When zoomed out everything was missing, and when zoomed in, some tiles stayed missing. Are they timing out, do you think?
@rsignell-usgs This was a fun one to debug!
The problem is that the SGRID metadata for that dataset is wrong. This issue is that invalid ranges are being requested at the edge of the grid, causing exceptions. So when it's zoomed out, the entire thing fails, when zooming in, only the tiles at the edge fail. I have been testing it with the variable v
. Looking at the metadata for that dataset, I can see the following (for brevity, only pertinent information included):
dimensions:
eta_rho = 800 ;
eta_v = 799 ;
xi_rho = 160 ;
xi_v = 160 ;
double v(ocean_time, s_rho, eta_v, xi_v) ;
v:grid = "grid" ;
v:location = "face" ;
double lon_rho(eta_rho, xi_rho) ;
lon_rho:grid = "grid" ;
lon_rho:location = "face" ;
double lat_rho(eta_rho, xi_rho) ;
lat_rho:grid = "grid" ;
lat_rho:location = "face" ;
int grid ;
grid:cf_role = "grid_topology" ;
grid:topology_dimension = 2 ;
grid:node_dimensions = "xi_psi eta_psi" ;
grid:face_dimensions = "xi_rho: xi_psi (padding: both) eta_rho: eta_psi (padding: both)" ;
grid:edge1_dimensions = "xi_u: xi_psi eta_u: eta_psi (padding: both)" ;
grid:edge2_dimensions = "xi_v: xi_psi (padding: both) eta_v: eta_psi" ;
grid:node_coordinates = "lon_psi lat_psi" ;
grid:face_coordinates = "lon_rho lat_rho" ;
grid:edge1_coordinates = "lon_u lat_u" ;
grid:edge2_coordinates = "lon_v lat_v" ;
So the variable v
is defined (by the location
attribute) as being on the faces of the staggered grid. The faces of the grid are defined by the 2d coordinate variables lon_rho
and lat_rho
. But these coordinate variables have the size 160x800. v
only has the size 160x799.
The location
attribute for v
should be edge2
, not face
. Indeed if I replace it manually it works perfectly. I will add some code to our SGRID processing code to do some sanity checks on cases like this. If I come across any, I'll log them and simply not make the offending variables available.
@guygriffiths, OMG. I'm so sorry that the problem was on our end. :crying_cat_face: Good sleuthing !
Following on the success of the UGRID conventions, @hrajagers created the SGRID conventions, with feedback from the community of staggered grid model providers.
Now that NCWMS2 has UGRID support, would ncWMS2 developers be interested in adding SGRID support?
This should be much much easier -- it's basically just conventions to unambiguously specify how to average velocity components to the grid cell center so you can plot the velocity vectors.