rhugonnet commented 1 year ago

We currently have 25 issues open on co-registration :scream_cat:, and a lot of notes and plans for the way forward dispersed everywhere. I initially tried to organize this in a project: https://github.com/orgs/GlacioHack/projects/3/views/2, but I think that the level of detail of the issues is too inconsistent for this, so I'm writing this summary instead!

Architecture: We already did a lot in https://github.com/GlacioHack/xdem/pull/158 and https://github.com/GlacioHack/xdem/pull/329 to re-organize the coding structure to be more consistent and robust, but some points are left:

A detailed list in #327 especially about cleaning + having consistent subfunctions of class methods + side-features (see next point),
Additional cleaning mentioned in #266,
The discussion of https://github.com/GlacioHack/xdem/issues/402 about merging point methods into a single fit and apply, and call methods depending on Point-Point, Point-Raster or Raster-Raster support, which would also completely solve #134.

Features: We've been needing for a while to have consistent:

Plotting: #139 (also in #327),
Statistics (in #327),
Weight support: #59 ,
Subsampling: #243, #137 and mainly the long discussion in #428.

Most of them are not too far since we introduced a consistent structure for optimizers or binning in https://github.com/GlacioHack/xdem/pull/158. It makes weights almost directly supported (but we'll need to raise proper warnings for methods that currently ignore, such as ICP), and will make plotting more easy by consistently treating the type of methods (binning, fit, or both) and the dimensionality of the variables (1D, 2D, ND), which can be re-used for Tilt or NuthKaab fits in the affine functions.

Tests: Some tests are still a bit old or slow, several related issues could be solved all at once:

Consistent data fixtures in test modules: #427,
More varied test data #64, #140 and https://github.com/GlacioHack/geoutils/issues/310, with synthetic data examples from GeoUtils here: https://github.com/GlacioHack/xdem/pull/358#issuecomment-1522200021.

Performance: :warning: We really need to think ahead for a structure that will allow memory-efficient computations using Dask:

For reading/writing with georeferencing, this will depend directly on rioxarray, and https://github.com/GlacioHack/geoutils/issues/383 and #392, so not too much to think about!
For Coreg.fit, as a regression requires all samples at once, it cannot be combined from solutions of different chunks (except in a blockwise way using BlockwiseCoreg). So it's all about the subsample which is fairly easy to deal with (read subsample per chunk + return concatenated vector). There's also the computation of derivatives needed, which are also straightforward (slope using overlapping chunks, bias_vars using normal chunks), see the thoughts in https://github.com/GlacioHack/xdem/issues/428#issuecomment-1707538808. Most other methods residuals, plot can be based on the same logic as they use subsample.
It leaves the logic for Coreg.apply, which might not be trivial. For bias corrections, the solution from the optimized function is applicable independently to every pixel given their bias_vars (coordinates, angle, etc), so very easy to apply to chunk. However, for affine methods, applying a 3D affine matrix in 4x4 format lazily to independent chunks won't work directly... it would also require a definition of the rotation center of the matrix, and maybe other things... Any thoughts on how to address this @erikmannerfelt? Or maybe @friedrichknuth has some insights?
We could also improve affine pipelines performance with #79,
And improve memory performance by adding a default resampling logic, see #437.

Bugs: Here there's a lot, but they might solve themselves (or become irrelevant) after changing architecture + tests:

423, #422, #404, #326, #232, #193

Idea of plan moving forward:

Think ahead on logic for Performance: already done a bit in https://github.com/GlacioHack/xdem/issues/428#issuecomment-1707538808. I think we can use the structure proposed there which should work eventually :crossed_fingers:! For apply, this should be adaptable down the line...
Focus first on things all related to Architecture: to avoid the effort of adapting all the rest if we did it first (new features, bug fixes and tests). A lot of lessons learned from https://github.com/GlacioHack/xdem/pull/158 and https://github.com/GlacioHack/xdem/pull/329.
Add basic architectural Tests to ensure 2. is working as intended with what we already have, and add new data for consistent (and quick!) tests (this specific point might require its own thread of discussion).
Add support for all new Features: should be made easier by the consistent architecture!
Add new Tests in parallel of each feature in 4.
See if the Bugs are still relevant, and fix them if they are! :smile:
Celebrate, for we would have reached quite a way! :champagne:

Any thoughts @adehecq @erikmannerfelt? :smile:

rhugonnet commented 1 year ago

To clarify on the Coreg.apply for chunks:

The chunked writing of the transformed DEM would be fairly easy...
... But we'd need an apply_matrix_coords function that can transform the 4x4 matrix into a Callable that takes in 2D coordinates as input, and outputs the affine transformation at any given pixel (like is currently done in BiasCorr classes with fit_func).

Is that something we can do currently @erikmannerfelt?

adehecq commented 1 year ago

Any thoughts @adehecq @erikmannerfelt? 😄

Wow, I'm a bit overwhelmed with the list of things left to do to have any useful feedback for the moment. It's great you made an exhaustive list of all the things to be considered for coreg. Now we need two things:

time ! There are a lot of changes to be made so we need to coordinate efforts on this. I'm happy to actively contribute to the code writing, but either way, someone will have to review the code so it's better if we do it in parallel rather than having to go through a huge PR
split this huge list into a sequence of steps that we can go through.

What was your idea regarding this project. Did you want to work on it in the coming days/weeks? We should try to set some blocks of time to work together on this. We can coordinate over Slack or face to face.

rhugonnet commented 1 year ago

Yes it is a bit overwhelming like this haha, but thankfully some of these points are fairly small! :slightly_smiling_face:

We'll need more detailed lists, but I think those can live within each PR that will address a new step/feature listed above. The architecture work will have to be one big PR, as everything is inter-dependent on the structure.

With all the discussions and the work already done in #158 trying to make the structure generic and robust, I have a clear vision of the technical steps to go through to reach every point. And I don't want to wait too long and lose it! I'd be happy to write a more detailed plan, discuss it, and combine efforts if you both can block some days for it, but can otherwise push on my own in the frame of the coreg study! :wink:

My idea was to start on this full time end-of-September (for the co-registration study to reach a good stage before AGU).

rhugonnet commented 1 week ago

One year later, and we've addressed most of the points of this (long) issue and the core coregistration module is now close to being finalized! :partying_face: :balloon:

Now that the structure, input support and modular arguments are fully consistent across all coregistration methods, we can more easily address the three remaining aspects (which are either about maintenance or adding new features):

Improve tests further, which is mostly about the variety of test data, and the way it is pre-processed and passed to tests (see list in first post above), which joins #563 started by @adebardo,
Add plots, statistics which joins https://github.com/GlacioHack/geoutils/issues/602 and #569 by @adebardo, and other some small convenience routines such as #437,
Finalize scaling abilities with out-of-memory coregistration greatly moved forward in #525 by @ameliefroessl (which can be implicitly supported once Xarray/GeoPandas accessors are implemented!).

Another point that comes to mind would be to add CoregFilter classes to use more easily in the CoregPipeline, but right now the user can always customize his input inlier_mask separately to remedy that.

GlacioHack / xdem

Way forward for co-registration module #435

423, #422, #404, #326, #232, #193