Boundary variables for auxiliary coordinates of more than two dimensions

cf-convention / cf-conventions

AsciiDoc Source

http://cfconventions.org/cf-conventions/cf-conventions

Creative Commons Zero v1.0 Universal

89 stars 46 forks source link

Boundary variables for auxiliary coordinates of more than two dimensions #527

Open JonathanGregory opened 4 months ago

JonathanGregory commented 4 months ago

Section 7.1 on "Cell boundaries" contains the following text, about providing boundary variables for auxiliary coordinate variables of more than two dimensions:

Bounds for multi-dimensional coordinate variables with p-sided cells

In all other cases, the bounds should be dimensioned (...,n,p), where (...,n) are the dimensions of the auxiliary coordinate variables, and p the number of vertices of the cells. The vertices must be traversed anticlockwise in the lon-lat plane as viewed from above. The starting vertex is not specified.

There is no example given. I suppose it must mean something like this for three dimensions:

variables:
  float x(nk,nj,ni);
    x:bounds="x_bounds";
  float x_bounds(nk,nj,ni,p);

Since it talks about traversing the vertices clockwise in the lat-lon plane, it must be concerned with horizontal faces of the 3D cells. A horizontal coordinate variable only has to be 3D (rather than 2D) if the cells aren't aligned in vertical columns. Do you agree with this interpretation? If so, we should clarify it. 3D cells don't just have horizontal faces, so this convention is of restricted use. For a more general treatment of bounds in two dimensions, we could refer to UGRID.

Cheers, Jonathan

JonathanGregory commented 2 months ago

Dear all

I propose the following changes to remedy the defect described above in Sect 7.1 "Cell boundaries". In addition I propose to insert some new subsection headings, explanatory text, and rearrangement of text in Sect 7.1, to improve clarity. I don't believe this is any change to the meaning of the convention, except for one small generalisation (at the end). On account of that, and because the text changes are quite extensive, I've relabelled the issue as an enhancement.

The proposed changes are listed below. You might find it easier to look at the pull request or the HTML of the modified document.

Cheers

Jonathan

On re-reading, I suppose that "multidimensional" is referring to the dimensionality of the arrays, rather than the physical space. To clarify this, I've propose deleting this word, and adding a sentence to the end of the first paragraph of Sect 7.1: "CF can currently describe boundaries for cells which have one or two spatial dimensions, but does not provide conventions to describe the boundaries of cells with three spatial dimensions. Please refer to UGRID for development of such conventions."
Between the second and third paragraphs of Sect 7.1, insert heading "7.1.1 Boundaries and formula terms".
Delete the short paragraph, "Applications that process cell boundary data often times need to determine whether or not adjacent cells share an edge. In order to facilitate this type of processing the following restrictions are placed on the data in boundary variables." This paragraph is unnecessary because the following subsections have specific statements about contiguousness.
Promote the bold text "Bounds for 1-D coordinate variables" to become the heading for new subsection 7.1.2, remove the indentation of the following lines, and correct the typo in "identically".
Move Example 7.2 into 7.1.2. It is concerned with a 1D latitude axis, but currently it appears at the end of 7.1, after the multidimensional case.
Modify the bold text "Bounds for 2-D coordinate variables with 4-sided cells" to "Bounds for horizontal coordinate variables with 4-sided cells", promote it to become the heading for new subsection 7.1.3, and remove the indentation of the following lines.
I propose that we modify the first sentence in 7.1.3 to provide a bit more context and explanation, from

In the case where the horizontal grid is described by two-dimensional auxiliary coordinate variables in latitude lat(n,m) and longitude lon(n,m), and the associated cells are four-sided, then the boundary variables are given in the form latbnd(n,m,4) and lonbnd(n,m,4) ...

There is a common case of a rectangular horizontal grid, with four-sided cells, whose two axes are not latitude and longitude (e.g. it uses a map projection from <> or a curvilinear grid, such as the tripolar ocean grid). In that case, two-dimensional auxiliary coordinate variables in latitude lat(n,m) and longitude lon(n,m) may be provided as well. Since the sides of the cells do not generally have constant latitude or longitude, all four vertices must be specified individually. Therefore the boundary variables for the two-dimensional auxiliary coordinate variables are given in the form latbnd(n,m,4) and lonbnd(n,m,4) ...

The next paragraph describes the anticlockwise traversal of the four-sided cell, but does not require this ordering. I'm certain we have always required it, as is indeed stated in the final paragraph about the "multidimensional cells". Therefore I have rephrased the present paragraph to state this requirement as well.
Move Example 7.3 into 7.1.3. It illustrates the 2-D 4-sided case, but currently it appears at the end of 7.1, after the multidimensional case. Change its title from "Cells in a non-rectangular grid" to "2-D cells in a non-latitude-longitude grid", because the grid is logically rectangular (64,128).
Modify the bold text "Bounds for multi-dimensional coordinate variables with p-sided cells" to "Bounds for coordinate variables with p-sided cells in two spatial dimensions", and promote it to subsection heading 7.1.4, with the following text:

In the general case of a grid composed of polygonal cells in two spatial dimensions with p sides and vertices, or a mixture of polygons where p is the maximum number of sides and vertices, the grid could have one, two or more dimensions, depending on how it is organised logically (e.g. as a 1-D list or a 2-D rectangular arrangement). The boundary variables for the auxiliary coordinate variables are dimensioned (...,m,p), giving coordinates for the p vertices of each cell, where m are the horizontal dimensions. If the cells are in a horizontal plane, their vertices must be traversed anticlockwise in the longitude-latitude plane as viewed from above. The starting vertex is not specified.

The case of a 2-D horizontal coordinate variables with 4-sided cells (Section 7.1.3) is a particular case, with p=4 for boundary variables dimensioned (n,m,p), where n and m are horizontal dimensions. See also <> for conventions describing horizontal cells with more complicated geometry and topology.

This is a small generalisation to admit the possibility of 2D cells in other than the horizontal plane e.g. in (height,latitude). I'm sure CF is being applied to those cases, and they might need 2D auxiliary coordinates. I haven't made the requirement for the vertices to be traversed in a right-handed way apply to cases apart from the horizontal plane, where we already have that requirement.

JonathanGregory commented 2 weeks ago

I have updated the PR #547 to make two further changes on which there appears to be consensus in discussion 380, namely

Delete "If bounds are not provided, an application might reasonably assume the gridpoints to be at the centers of the cells, but we do not require that in this standard" in the preamble of section 4.
In the first paragraph of section 7.1, on bounds, insert

If cell boundaries are not provided (using the bounds attribute), an application can assume only that each gridpoint lies somewhere within or upon the boundaries of its own cell. Without a boundary variable, the extent of a cell is not known, nor whether adjacent cells are contiguous, separated by a gap, or overlapping.

I would be grateful for support to be expressed for these changes, so that they can go into 1.12, which we must agree by Monday 11th Nov i.e. next week.

Thanks.

TomLav commented 2 weeks ago

@JonathanGregory > I really like and support the re-arrangement in Chapter 7.

Would it be easier for a new reader to see the following order of sections?

Bounds for 1-D coordinate variables
Bounds for 2-D coordinate variables with 4-sided cells
Bounds for multi-dimensional coordinate variables with p-sided cells
Boundaries and Formula Terms

I do not know if such a re-ordering is easy to take in this PR or if it should be moved to a later PR.

sethmcg commented 2 weeks ago

@TomLav -- Thinking back to when I first read through the spec, it was easiest when the concepts relevant to what I was working with (output from numerical weather models) came first. I would say that the best ordering for new readers is basic concepts and common use cases first, with more complicated / specialized stuff coming later.

So I think your ordering makes a lot of sense. I definitely agree that Boundaries & Formula Terms should be separated out into its own subsection, and that it should come after the section on 1-D bounds.

One thing that I think might make it easier for the reader would be to talk about a true 1-D case first, like time bounds. Because in terms of use cases, the different things readers want to know are how to handle time bounds, bounds for a lat-lon grid, bounds for a projected rectilinear grid, bounds for an unstructured grid, and bounds for a parametric z coordinate. I think that's roughly the ordering of most common to least common use case, and it's also the ordering you have suggested (plus the time bounds case).

ChrisBarker-NOAA commented 2 weeks ago

+1 to all of what @sethmcg said :-)

I've been thinking about this and started to write up some more detailed discussion -- far more detail than we should put in CF, but I think it would be good to have this sort of dc somewhere. Specifically things like @sethmcg mentioned:

but a bit more general, like:

How to describe:

A lat-lon aligned rectangular grid A projected rectangular grid. A curvilinear logically rectangular grid. An unstructured grid (this one should point the UGRID spec)

Note that as it stands, CF doesn't (with the exception of the new UGRID spec) talk about the relationship between grids and cells much at all -- it talks about grids, and it talks about cells, and a grid is a particular arrangement of cells, but that has to be inferred (calculated?) from the cell bounds.

Or maybe I've missed something, but in any case, I think we need more documentation about that.

I've made a start here:

https://github.com/ChrisBarker-NOAA/CF_conventions_notes/blob/main/rect_grid/rectangular_grids_in_CF.md

It's only a start -- but hopefully you can see where I"m going with this -- i'd love any feedback or help anyone might provide.

The end goal of that doc is that both produced of output from gridded models, and consumers of such output will know what to do with a full example.

Armin-RS commented 2 weeks ago

Hi @ChrisBarker-NOAA , is it possible to add comments to your markdown page ?

Looking through it I remembered the Arakawa C grid (https://en.wikipedia.org/wiki/Arakawa_grids), used by the (now outdated) COSMO-D2 model, which defines some parameters on the nodes, some (like wind speed) on the borders of the cell, and I imagine that some parameters are means for the whole cell.

taylor13 commented 2 weeks ago

I support the proposed change.

Don't hold up things for this, but I did notice the statements under figs. 7.1 and 7.2 that say: "Tuples (lon(i),lat(j)) represent grid cell centers." Given the discussion that the cell coordinate values aren't necessarily half way between the bounds, should these statements be reworded? Perhaps "The tuple (lon(i),lat(j)) represent the coordinate location of a cell."? Or perhaps "Tuples (lon(i),lat(j)) represent grid cell nominal centers."?

ChrisBarker-NOAA commented 2 weeks ago

@Armin-RS: yes, please do! You can use gitHbu issues or PRs, or, if you want to do more than a little bit, I'm happy to give you access.

As for "Arakawa C grid" -- yes, I hope to get there -- and there are other complications there -- fluxes through cell walls, etc.

JonathanGregory commented 2 weeks ago

Thanks for your helpful comments - but please could you refrain from putting substantial comments or holding discussions in the PR. (No offence intended - I am commenting on procedure as chair of the committee.) It's fine to put small comments in PR, on typos or whatever, and it's convenient since you can add them to the line at which they apply. However, it's hard to follow the discussion, and to review it subsequently, about the wording, arrangement and content if it's not all in one place.

For reference, I am therefore copying your comments here:

@TomLav

Thanks for the PR @JonathanGregory. I will discuss a bit further below.

We remove the reasonable assumption and add the no-default statement. It strikes me that we have maybe not concluded on the corner-case when the data producer wants to convey that the cells are in fact points. Should we include the corner-case in the no-default disclaimer?

Without a boundary variable, the extent of a cell is not known, nor whether adjacent cells are contiguous, separated by a gap, overlapping, or in fact points (cells with zero lengths).

@ChrisBarker-NOAA

It strikes me that we have maybe not concluded on the corner-case when the data producer wants to convey that the cells are in fact points.

What does "the cells are in fact points" mean? in my mind, points, are well, points, and not cells at all, and that's the usual definition of data in CF that doesn't have cell bounds.

the closest I've been able to find for a definition of "cell" is:

"When gridded data does not represent the point values of a field but instead represents some characteristic of the field within cells of finite "volume,"...

"finite" to me means, "not a point".

@TomLav

I am starting to wonder if our idea of adding a "no-assumption" sentence in Chapter 7 is that good.

Re-reading the very start of section 7:

When gridded data does not represent the point values of a field but instead represents some characteristic of the field within cells of finite "volume," a complete description of the variable should include metadata that describes the domain or extent of each cell, and the characteristic of the field that the cell values represent.

To me, this reads like the absence of :bounds should rightly be interpreted as providing locations for point values of a field. Because providing :bounds was specifically created for the cases where the data does not represent point values. Shouldn't the logical interpretation of the absence of :bounds rather be point values?

The new no-assumption sentence states that we cannot assume anything about cells in the absence of :bounds, but by doing so it actually brings the concept of a cell to mind, which seems contradictory to the first sentence of Chapter 7.

Maybe the best solution would be to delete the reasonable assumption sentence from Chapter 4, and add a sentence in Section 7 that reads: "In the absence of :bounds, the data represents the point values of a field". Then it is clear what the interpretation is when no :bounds are provided.

Sorry, I am going in circles a bit. Removing the sentence from Chapter 4 is still a very good idea.

@taylor13

I don't think the absence of bounds can imply "point" data unless we make bounds a requirement for data representing cells. In the past, this has not been a requirement, so we can't change this without upsetting backward compatibility.

@TomLav

In that case, the absence of :bounds cannot imply anything: neither that the axis value represents cells, nor that such cells are of any extent and shape.

Without a boundary variable, the axis values can neither be assumed to hold point positions, nor that the axis values represent cells. An unambiguous way to define point positions is to use the :bounds attribute to define 0-length cells. The only way to define cells, their position, and extent, is to use the :bounds attribute, as described below.

@ChrisBarker-NOAA

I don't think the absence of bounds can imply "point" data unless we make bounds a requirement for data representing cells. In the past, this has not been a requirement, so we can't change this without upsetting backward compatibility.

Is that really the case? how in the world can you have cells if you haven't defined them somehow?

If it really was the case that that one could put in data representing cells without defining what the cells are, then I suppose what you had was:

These data are on cells of unknown geometry -- seems like a bad idea to me, but if that's what CF used to allow, then I guess it still does.

So how do you know if the data are point data or cell data?

Is it point data if there is no cell_method, and cell data if there is?

in the current text:

When gridded data does not represent the point values of a field but instead represents some characteristic of the field within cells of finite "volume," a complete description of the variable should include metadata that describes the domain or extent of each cell..and the characteristic of the field that the cell values represent

OK, so that is a "should" not a must....

But if it IS cell data, then it must (?) have a cell_method and/or a cell_measure (e.g. cell-area).

I guess it's not useless to have, e.g. have a cell_method and a cell-area with no defined bounds, though not great.

Even a cell_method with no other definition of the cells is still some information.

Not sure what this means for the text, but maybe something along the lines of, in the intro to 7, something like:

"Data is Representative of Cells if there is a cell_method or a cell metric defined"

Along with the "should" for the bounds.

JonathanGregory commented 2 weeks ago

Dear @TomLav @sethmcg @ChrisBarker-NOAA @taylor13

Thanks for your comments. I had already put the formula terms into its own section. @TomLav's suggestion is to move this section to the end. I didn't do that at first because I started this issue with the aim of minimal changes for clarity. However, it's no problem to move it, so I've done so, given that you agree it's better at the end.

Taking Seth's suggestions of considering time, and a purely 1D case, I have mentioned 1D time in the preamble of section 7, and I have changed Example 7.1 from latitude to time. Also I moved the following text, which described the same example, into the box of the example, and I have spelled out more details. In addition to Example 7.1, we still also have Fig 7.1, which is about 1D latitude and longitude together, so both cases are now covered. On reviewing the preamble of sect 7, I thought it was not as clear as it could be, so I have rearranged and modified it in other ways too, though not intending to change its meaning.

In the last version, I had inserted a statement that, without bounds, you could assume only that the point is somewhere within or on the boundary of the cell. I noticed subsequently that this is a recommendation in the conformance document, not a requirement. Hence, without bounds, nothing at all can be assumed about the relationship between cells and points, as @TomLav suggests. I have now stated this instead in sect 7.1. I have also inserted an explicit statement that if you want to indicate that a cell has zero size you must give it explicit coincident bounds. Also in sect 7.1, I have added more text to describe the recommendation that the point should lie within the cell, and also the requirement that the bounds should be ordered in the same sense as the coordinates. I've added the latter requirement to the conformance document, which didn't have it before, although it was already stated with "must" in the convention.

I agree with Chris's reluctant conclusions, "I guess it's not useless to have, e.g. a cell_method and a cell-area with no defined bounds, though not great. Even a cell_method with no other definition of the cells is still some information." I've always understood that to be the intention of the convention. Bounds and cell methods are both separately optional, but informative, especially in combination. Bounds are currently optional, not recommended. I suggest we postpone reconsidering whether they should be recommended to version 1.12.

I have altered the reference to "centers" that Karl noticed in the captions of Figs 7.1 and 7.2. It now says that the coordinate values "locate the gridpoints".

The aim of the present issue is clarification of the current convention. In order to meet the CF 1.12 deadline (beginning the three-week period for acceptance on Monday), we must limit our ambitions. :smiley:

The PR #547 now shows the text following the above changes. They are also shown in the modified HTML conventions document. What do you think?

Best wishes

Jonathan

taylor13 commented 2 weeks ago

After a quick skim, I think it is good to go for CF 1.2 . I'm sure we could always find ways to improve on it, but it is definitely better than before.

ethanrd commented 2 weeks ago

Because both NUG and COARDS define coordinate variables but not cells or bounds, I think it would be useful to continue to have some guidance on the undefined nature of cells in chapter 2. It should perhaps also include some description in terms of pixels as I think this is often a point of confusion for those who are used to dealing with raster/image data and may not think of cell/bounds as something relevant. Perhaps something like:

If bounds are not provided, the location of the gridpoint within the cell/pixel is undefined. Because both the NUG and COARDS define 'coordinate variables' but not cells or bounds, many applications handle gridpoints without an associated bounds as being located at the centers of the cells/pixel. To be explicit, cell bounds should (must?) be defined.

Probably the raster/image language should be a separate discussion with a more extensive review.

ChrisBarker-NOAA commented 2 weeks ago

Probably the raster/image language should be a separate discussion with a more extensive review.

absolutely - there may even be a need to add something specifically for pixels -- I've noticed that sometimes pixels are defined by their corner and size, sometimes by the center points.

Also:

"The raster data model consists of rows and columns of equally sized pixels interconnected to form a planar surface."

From a random google: https://pressbooks.pub/gist/chapter/6-2/#:~:text=The%20raster%20model%20will%20average,from%20which%20it%20is%20derived.)

But that's consistent with what I've seen, including GDAL: https://gdal.org/en/latest/user/raster_data_model.html

Key point is that the pixels are all the same size (in the coordinate system used) and that size is defined -- so you don't need to specify all the coordinates -- if you know the corner, the pixel size, and the raster size (number of pixels) you. can. compute the locations of all the pixels.

You can cover that in CF the two coordinate dimensions that happen to be equally spaced -- but even then it's a bit confusing as to whether the coordinates are the center or corner, and the dx and dy are not clearly stated (can be computed of course)

I know that CF is trying to be general, but saying something about specific cases like this is a good idea -- either as a specific spec, or at least a documented: "best practices for storing raster data".

Anyway -- new topic :-)

JonathanGregory commented 2 weeks ago

Dear all

Thanks to your comments, enough support has been expressed for us to be able to accept this proposal in three weeks, on 1st December, if no-one raises any concerns before then.

@ethanrd, thanks for your suggestion. I agree that it's useful to compare CF with raster pixels. I also agree that we could have an extensive discussion of it, which we don't have time for before CF 1.12 is finalised. However, I've made an attempt, following your suggestion. I've updated the PR #547 and HTML as below. Will this help?

Cheers

Jonathan

Addition of new text (in bold) to the definition of "cell" in sect 1.3, "Terminology":

A region in one or more dimensions whose boundary can be described by a set of vertices recorded in boundary variables. The term interval is sometimes used for one-dimensional cells. A two-dimensional cell is analogous to a pixel in a raster graphic, but is a more general concept (see section 1.4, "Overview").

Addition of new text (in bold) and small replacements, in sect 1.4, "Overview"

It is often the case that data values are not representative of single points in time, ~~and/or~~ space and other dimensions, but rather of intervals or multidimensional cells. ~~This convention~~ CF defines a bounds attribute to specify the extent of intervals or cells. Because both the NUG and COARDS define coordinate variables but not cells or bounds, many applications assume that gridpoints are always located at the centers of their cells. This assumption does not hold in CF. If bounds are not provided, the location of the gridpoint within the cell is undefined, and nothing can assumed about the location and extent of the cell.

A two-dimensional cell is analogous to a pixel in a raster graphic, but is a more general concept. Pixels in a raster are rectangular, all of the same size, and arranged in a logically rectangular array with their nominal point locations at their centers. By contrast, two-dimensional cells in a CF field do not necessarily satisfy any of those conditions, though they commonly do. Furthermore, as an alternative to cells in two dimensions, CF defines a convention for the case where each data value is associated with a geographical feature that is described by one or more points, lines or polygons.

When data that is representative of cells can be described by simple statistical methods (for instance, mean or maximum), those methods can be indicated using the cell_methods attribute. An important application of this attribute is to describe climatological and diurnal statistics.

ChrisBarker-NOAA commented 2 weeks ago

I think we should avoid talking about pixels without more discussion:

http://alvyray.com/Memos/CG/Microsoft/6_pixel.pdf

Title: "A pixel is not a little square"

JonathanGregory commented 2 weeks ago

Dear Chris

I added this text because of your comments and Ethan's that it would be useful to compare the CF concept of a cell with the pixels of a raster. I can take it out again if it's wrong or not useful, but I do think it could help some people who are new to CF, so perhaps we can improve it. What do you think, @ethanrd?

You're right that we shouldn't say anything about the shape of pixels. How about this (replacing three of the sentences in the previous draft):

A two-dimensional cell is analogous to a pixel in a raster graphic, but is a more general concept. Pixels in a raster are evenly spaced in each dimension and arranged in a logically rectangular array. Two-dimensional cells in a CF field do not necessarily satisfy either of those conditions, though they commonly do.

Wiktionary defines "pixel" as "One of the tiny dots that make up the representation of an image in a computer's memory" or "One of the squares that make up a work of pixel art or a zoomed-in image in a computer." About raster graphics, Wikipedia says "A raster graphic represents a two-dimensional picture as a rectangular matrix or grid of pixels."

Best wishes

Jonathan

JonathanGregory commented 2 weeks ago

@ChrisBarker-NOAA and @TomLav. Thanks for your comments on the PR #547, which I have implemented, as shown in the HTML.

ethanrd commented 2 weeks ago

Thanks @JonathanGregory! I like your pixel text and think it would be good to include it in 1.12.

It is clear from the article @ChrisBarker-NOAA referenced that pixels are a more complicated topic than captured here and deserves further discussion in a new Discussion/Issue. Though I also wonder if, for CF, it would fit better in a discussion around describing sensors and sensor geometries rather than in terms of image processing.

JonathanGregory commented 1 week ago

Thanks, Ethan. I have updated the PR #547 and the HTML with my revised sentences from above.

Yes, I think you're right that CF users are more likely to be familiar with rasters and pixels in connection with sensors than with graphics. We can work on a revision of this text once CF 1.12 is done, if you like.

ethanrd commented 1 week ago

Sounds good. Chris @ChrisBarker-NOAA, does this sound ok to you? Adding Jonathan's pixel text for 1.12 and revisiting after.

ChrisBarker-NOAA commented 1 week ago

I think we can (and should) talk about rasters, from teh GIS point of view -- but try to leave the word pixel out of it, but certianly don't say a pixel is a rectangel :-)

OGC talks about coverages, of which this is a regular 2 dimensional grid coverage https://www.w3.org/TR/sdw-bp/#coverages

or there's the GDAL raster model: https://gdal.org/en/latest/user/raster_data_model.html

which does talk about pixels, but not as "squares" or "rectangles" as far as I can tell with a quick read.

JonathanGregory commented 1 week ago

@ChrisBarker-NOAA. The current text doesn't say the pixels are rectangular. It says they are arranged in a rectangular array.