busstoptaktik / geodesy

Rust geodesy
Apache License 2.0
66 stars 6 forks source link

Prototype multi grid support #65

Closed Rennzie closed 10 months ago

Rennzie commented 10 months ago

Callout: My implementation may be somewhat naive not knowing all the thought that's already gone into RG. That said I've attempted to, as simply as possible, add multi grid support while allowing alternate grid types via a new GridTrait.

GridTrait has the methods required to support the gridshift and deformation operators. Notably it includes a bands method to replace the public bands property in the Grid struct.

I changed how operators access a grid to go through the get_grid method on the context. Operators would get a reference to a grid from the context and then store it on the ParsedParameters struct. I've used a Vec there instead of a HashMap to avoid borrowing conflicts. All this has the advantage of empowering the context with grid handling. The previous implementation required operators to access a resource on the context and then create a grid. This is fine for systems who have access to the disk, but in web applications it requires doubling the memory requirement for the grid. First as a resource loaded at instantiation and then as a grid when the operator is used.

To be able to use the GridTrait I had to wrap it in Rc<dyn GridTrait> to satisfy Clone requirements. I feel this is a positive because grids are readonly. Note, we might want to consider using Arc instead to facilitate concurrency?

Lastly, I updated usage of the grid to work with a Vec. This allows definitions to apply multiple grids ie gridshift grids=gridA.gsb,gridB.datum,gridC.tiff. The grids are then applied one after the other.

I've updated ntv2 support accordingly https://github.com/Rennzie/geodesy/pull/1 which makes implementation in geodesy-wasm simpler: https://github.com/Rennzie/geodesy-wasm/pull/28

Interested to hear your thoughts.

busstoptaktik commented 10 months ago

Sorry for taking so long (an entire week) to respond to this! Unfortunately, I think there's a fundamental flaw in the algorithm implemented: Remember that grids (and subgrids) may overlap, and that the correction selected should be the first matching grid in the list. Hence everywhere, the iteration order should be changed to this:

for point in points {
    for grid in grids {
        if point in grid {
            apply_correction(point);
            break;
        }
    }
}

As implemented, the correction will be applied multiple times in case of overlapping grids (due to the inverted iteration order, which makes it impossible to add the missing break).

Here is an elaborated example, of a list-of-grids, terminated by the catch-all null grid.

+nadgrids=C:\temp\BETA2007.gsb,@null

IIRC, the application order convention is the other way round for the NTv2 format, such that NTv2 internal subgrids should be iterated over in reverse. So this is an additional (small) complication.

Supporting the null grid (which in newer versions of PROJ is a purely virtual thing, i.e. a syntactical contraption, rather than an actual global null grid), and the @gridname convention for optional grids, would probably also be a good idea (but perhaps for a later PR?)

Also, I think you are right that the Rc<dyn GridTrait> should be replaced by an Arc<...>, and there may be more architectural intricacies to handle, either now or in later work - most notably, I think the trait name should be Grid, rather than GridTrait, so in consequence we need to find another name for the Grid struct. Peraps BaseGrid?

It does not need to happen in this PR: We can probably live with taking individual steps from the "make it work, make it right, make it fast" ladder.

Another thing: Would it be possible to reach all your goals for geodesy-wasm by simply implementing a specific Context for WASM usage?

busstoptaktik commented 10 months ago

most notably, I think the trait name should be Grid

Oops - missed your comment about this. I now see that this is on your radar already

Rennzie commented 10 months ago

Hey @busstoptaktik, no worries at all about the delay. You've been away so no expectations from me 😊.

Thanks for putting me right with the algorithm. I've update the gridshift and deformation operators accordingly.

I think handling @null and @optional grids would be better in another PR. I've added https://github.com/busstoptaktik/geodesy/issues/67 to tackle it. One question and assuming there is no @null applied, what should the behaviour be if none of the grids contain the point? Should it be set to Coor4D::nan() like you do in the deformation operator? The PROJ docs indicate the transform should error but the fwd and inv functions don't return a Result.

Regarding Ntv2 sub-grids, should that be left to the Ntv2 interpolation implementation? Or would Ntv2 sub-grids appear in the grids= param?

Lastly, for geodesy-wasm I have implemented a custom context here. It will be a great help to be able to use the get_grid which returns a pointer. The reason being that the previous implementation meant I needed to have the grid in memory twice. Once to load it into the context so get_blob had access to it, and then again when creating the grid for the operator to use. With these changes I'm sure I can achieve everything else I need via the WasmContext. Was that what you had in mind?

busstoptaktik commented 10 months ago

A somewhat more elaborate version of the pseudocode above

use rand::Rng;

fn main() {
    let mut rng = rand::thread_rng();
    let mut successes = 0;
    let use_null_grid = true;

    'points: for point in 0..10 {
        let grid_id = rng.gen_range(0..13);
        print!("Point: {point}");

        for grid in 0..10 {
            if grid!=grid_id {
                continue;
            }

            println!(" found in grid no. {grid}");
            successes += 1;
            // Handle grid correction here
            continue 'points;
        }

        // If the null grid is specified: pass through
        if use_null_grid {
            println!(" is passed through unchanged");
            continue;
        }

        // (Null grid not specified. Stomp on the coordinate)
        println!(" not found in any grid. Stomping on coordinate");
    }

    println!("Successes: {successes}");
}
busstoptaktik commented 10 months ago

The PROJ docs indicate the transform should error but the fwd and inv functions don't return a Result.

This is due to the fundamental architectural difference between PROJ and RG, that PROJ handles individual coordinates, while RG handles CoordinateSets (in the ISO-19111 sense of the word, i.e. "sets of coordinate tuples"), and we should expect that for any sufficiently large set, some will err. So we return the number of successes (rather than a binary Ok/Err chunk), and let the user decide how to handle that.

This also implies that we should mark off the error-struck coordinate tuples clearly by stomping on them with our biggest NaN-shoes, so the user can clearly see where something went wrong.

If the (virtual) null-grid is specified, nothing ever goes wrong, so we bypass the NaN-shoes, as shown in the updated code snippet in my previous message.

busstoptaktik commented 10 months ago

Regarding Ntv2 sub-grids, should that be left to the Ntv2 interpolation implementation? Or would Ntv2 sub-grids appear in the grids= param?

I have no firm opinion on this, except that I do not think the subgrids should be explicitly mentioned in the grids=... parameter. If every grid format handler implements a "inside" trait, the NTv2 code may say "inside" if the point is inside any of the subgrids, and we may then expect the corresponding "interpolation" trait to locate the relevant subgrid for us.

Thinking about it, to avoid searching through the subgrids twice (once to determine whether we're inside, and once to do the interpolation), it might as well be a feature of the "interpolation" trait to return None if the interpolation point is outside, then iterating along in the list of files until we get a Some(...)

Rennzie commented 10 months ago

This is due to the fundamental architectural difference between PROJ and RG, that PROJ handles individual coordinates, while RG handles CoordinateSets (in the ISO-19111 sense of the word, i.e. "sets of coordinate tuples")

Ah yes, that makes a ton of sense and certainly improves my mental model of RG - thank you! Thanks for the updated Pseudo code (didn't know about nested loop busting, very handy). I've updated the relevant places to reflect the suggestion. I've also included @null grid support as it seemed easy enough to incorporate. How do you feel it's all looking now?

Rennzie commented 10 months ago

Thinking about it, to avoid searching through the subgrids twice (once to determine whether we're inside, and once to do the interpolation), it might as well be a feature of the "interpolation" trait to return None if the interpolation point is outside, then iterating along in the list of files until we get a Some(...)

So you're thinking we remove the includes method of the Grid trait and update interpolation to return an Option? Something like:

pub trait Grid: Debug {
    fn bands(&self) -> usize;

    // fn contains(&self, position: Coor4D) -> bool; - REMOVED

    // Returns `None` if the grid or any of it's sub-grids do NOT contain the input coordinate
    fn interpolation(&self, coord: &Coor4D) -> Option<Coor4D>;
}

Then instead of the if grid.contains(coord) statements in the operators it would be

if let Some(t) = grid.interpolation(&coord){
  // use the first interpolation
  successes += 1
  continue 'points
}

//... handling `use_null_grid` etc

continue 'points
busstoptaktik commented 10 months ago

So you're thinking we remove the includes method of the Grid trait

Not necessarily removing it, since it may have its uses, but perhaps as an additional functionality. The thing is that the NTv2 (and the Geodetic TIFF Grid) specs allow subgrids, that may or may not fall fully within the main grid (i.e. both grid extensions and grid densifications), so to test for "inside" the drivers would have to loop through the subgrids (in reverse order), to find the right one, then redo it all again, when we decide that we want to interpolate in that grid.

In other words: The subgrid drivers (should) implement a similar functionality to the grids=grid1,grid2,grid3,grid4 logic