Parent child relationships

ccrook commented 4 years ago

Do you have any thoughts on what rules, if any, you will impose on parent/child relationships in nested grids?

Rules that could be defined include:

child grid extent is contained in parent grid
siblings may share common boundary but may not intersect
child extents be aligned with parent grid
child grid must be aligned with an integer subdivision of parent grid cells

Most (if not all) are unnecessary constraints for using nested grids, though grid developers may choose to apply them when creating grids.

From the point of view of using nested grids I think the main requirement is a well defined algorithm for selecting the grid(s) to use when evaluating the function at a given location (grids could be plural if using algorithms such as bicubic - a good reason not to do so!)

Obviously the primary test of which grid to use will be containment. Depending on which rules are enforced the algorithm could be based on parent/child status and/or ordering of grids in the GGXF file.

ccrook commented 4 years ago

Ok ... I've found some of this in https://github.com/Esri/ggxf/blob/master/IAGS-S-15-00075.pdf, ie

child extents be aligned with parent grid
child grid must be aligned with an integer subdivision of parent grid cells

So follow up question.. why do you impose these constraints? I don't think they are necessary from the point of view of using the grids (even if you are only using parent-child relationship to choose which containing grid to use at a location). I'm not sure they even add to the efficiency of using the grids.

kevinmkelly commented 4 years ago

Regarding IAGS-S-15-00075.pdf, this is really just a historical document expressing our ideas at the time. The design and its constraints and rules can, and likely will, change. In fact, it is precisely the necessary and sufficient requirements that we hope this forum can help to define. This is the most important exercise .

I agree, it is best if constraints such as the ones you've gleaned from the paper can be minimized, and let the application/algorithm decide which grid to use and how to use it. In my opinion, efficiency improves when the header/metadata describes, as comprehensively as possible, the grid characteristics. Applications are then free to apply their own constraints/rules without having to be constrained too much by rules imposed on the grid structure/format.

Here is a simple (and I hope reasonable) example. Consider two nested (sibling) grids A and B that overlap. This already violates rule 2 above "siblings may share common boundary but may not intersect". But there is a reason why we may want this situation. Suppose grid B is denser (smaller grid spacing) than grid A. An application rule could be imposed that denser grids take precedence. But, now consider that grid A has higher accuracy than grid B. It could be that in this case an application prefers grid A even though grid B is denser. (Higher resolution may not always imply higher accuracy.) A rule at the grid format level that restricts sibling grids from overlapping possibly imposes an undesirable constraint on the format and, for that matter, possibly the application as well.

This example suggests that some information be supplied about grid accuracy. This could be done at the header/metadata level, in which case my example makes some sense (it could also be done with another grid, but that would have to be an overlapping one). But if this were done at the grid node level using a multi-dimensional grid, the example breaks down and another approach must be taken.

ccrook commented 4 years ago

Regarding IAGS-S-15-00075.pdf, this is really just a historical document expressing our ideas at the time.

Excellent - sounds like we are in accord in terms of minimizing unnecessary constraints! At least once in the past I have been frustrated by NTv2 constraints making subgrids unnecessarily big.

However I do think there needs to be clarity about the algorithm used to select the grid to use at a point.

I believe the purpose of the format is to allow a publisher to convey a spatial model to a consumer. If they have not agreed on the algorithm then the consumer could easily end up with a different model to that intended.

As an example of an algorithm the PROJ GeoTIFF based grid format is very simple in this regard. It just uses ordering and containment. The grid that applies at a point is simply the last grid that contains the point. There is no need for an explicit parent-child relationship so constraints are minimal. Like this proposal the grid metadata (in the TIFF directory) is separate and can be stored up front to allow efficient usage.

ccrook commented 4 years ago

Just further to the rules on grid nesting and following some clarification from Even Rouault on the GeoTIFF format, could we propose a minimum set of rules that will not impose unnecessary constraints on the model and that will support efficient and (almost) unambiguous identification of the grid to use at a point (effectively a functional model nested grids - is there an existing standard for this?):

all grids in the nested grid structure are defined in terms of the same two dimensional CRS
the grids form a strict parent child hierarchy with a single root grid
child grids are strictly contained within parent grid (ie are smaller than the parent grids and do not extend beyond the parent grid)
child grids of the same parent grid do not overlap, but may intersect along a common section of boundary
child parent relationship may be defined implicitly by bounding box or explicitly by attribute

This does still risk some potential ambiguity where grids share a common boundary.

From an implementation point of view one other requirement that I have is that child grids of a parent grid share a common boundary, then one of the child grids will be used rather than the parent - that is that the handling of floating point rounding will ensure that child grids are favoured over parent grids.

kevinmkelly commented 4 years ago

I do not know if there is an existing standard that defines these rules, perhaps only the de facto NTv2 standard? I have no problem with these rules, certainly they will enhance the efficiency of use of GGXF files. No doubt other rules may need to be defined, but they should be considered carefully as you mentioned.

Some clarifications. In rule 4, what is the difference between "overlap" and "intersect"? It seems like rule 3 and rule 4 contradict each other? Rule 5 doesn't sound like it needs to be a rule. We should remove the words "strict" as they don't seem necessary and don't add anything; if the rules become part of the standard then they must be adhered to, simple as that! Should child grids always be favored over parent grids? Is that too restrictive?

ccrook commented 4 years ago

Definitely if we are developing a standard then this would need some careful wordcraft! At the moment I am just looking for clarity on what our functional model of a nested grid is.

I think there are three actors in this discussion: publishers, consumers, and developers. Our expectations are:

the publisher can convey a spatial model to a consumer - that is it unambiguously defines a set of values at each location within the extents of the model.
the consumer expects the developer to provide software that can evaluate the model efficiently. This may involve size of data set for transmission, size of dataset in memory during evaluation, efficiency of lookup, and so on.

Since there are many formats that achieve these requirements for a single grid, (eg TIFF, NetCDF) we are primarily looking at how to extend this to support a multiple grid format.

To answer your specific points:

In rule 4, what is the difference between "overlap" and "intersect"?

What I was trying to convey was that two child grids may abut one another, so hey share an edge. But that the area of overlap cannot be greater than 0. The main purpose for this is to ensure that the parent-child relationship is unambiguous. Also (apart from on the edge) it does ensure an unambiguous value. An alternative requirement could be that where a point could like within two or more child grids of the same parent grid then two grids define the same value. But then that leads into questions of interpolation functions etc.

It seems like rule 3 and rule 4 contradict each other?

Rule 3 is about parent-child spatial relationship (containment), rule 4 is about sibling spatial relationship.

Rule 5 doesn't sound like it needs to be a rule

No - call it a clarification - possibly unnecessary or even confusing! The other rules mean that the bounding boxes already define a parent child relationship so no explicity definition is required to support this. The addition of explicit parent-child metadata is an option for a format that we may choose to add.

Should child grids always be favored over parent grids? Is that too restrictive?

I did see that you have raised a question as to whether a consumer may choose to evaluate the model in different ways, such as ignoring more detailed grids if they do not need the corresponding level of accuracy. To me that feels like an edge case rather than the primary use case. We could expand this to include how the publisher might convey options to the consumer, but I think that quickly becomes very complex (eg spatial variation of accuracy, accuracy of horizontal vs vertical components, ...) Let's sort out the simple case first!

One thing I was trying to ensure with this constraint is that where a point is on a boundary of a child then the child will be used (ie inclusive test xmin <= x <= xmax). My reason for this relates to the example image below which shows the grids representing a deformation model that is more complex near faulting (red). Note that this shows the grid bounding boxes, not the actual grids themselves. I don't want floating point rounding to select the less accurate parent grid on the boundary between two child grids - it needs a generous inclusive test to ensure this I think.

example_nested_grid

RogerLott commented 4 years ago

In the very draft straw man doc I have posted at issue #2 , just to get something in the doc I used the NTv2 rules. As I understand it they are based upon the principle that there should be no ambiguity in interpolated value on any grid edge. For many applications I think this is a critical requirement. A question is, may one file contain multiple non-overlapping root grids? Rule 2 above (the grids form a strict parent child hierarchy with a single root grid) prohibits this. So assuming incomplete coverage over the maritime area separating them, geoid models for e.g. Trinidad and Tobago would either need to have two separate files or one file with an awful lot of null values. Is there a use case for multiple root grids in one file (with appropriate rules for this)?

ccrook commented 4 years ago

I think there is a case for multiple root grids. It is less an issue if you don't include some of the unnecessarily strict rules of NTv2 on alignment of child grids with parent grids. Simplistically all that is needed for uniqueness is a well defined algorithm for selecting the subgrid to use at any particular location.

For transformation grids (including those used in deformation models) to be invertible they must be continuous across the boundaries between subgrids. That is to say at the transition from one grid to another they must evaluate to the same transformation (ie the same dx,dy,dz). It is difficult to achieve that if grid cells are not aligned (ie the parent grid cell boundaries coincide with a child grid cell boundary). However it does not require that the edge of the child grid as a whole is on a parent grid cell boundary - it could lie within a parent grid cell.

So if you don't enforce child grids extending to the edge of parent grid cells you could have a single root grid with just one cell covering the entire extent.

In practice I think having multiple root grids might be much cleaner. This isn't an issue for the deformation model format as it needs to support multiple independent spatial models (nested grids) in any case. But it could apply for other geodetic quantities for which a single spatial model is required.

If the GGXF is intended to encode the full deformation model, rather than just one component of the full model (ie a patch or the velocity model), then it would need to support multiple independent nested grid structures, each with their own root grid(s).

Esri / ggxf

Parent child relationships #1