GlacioHack / xdem

Analysis of digital elevation models (DEMs)
https://xdem.readthedocs.io/
MIT License
135 stars 39 forks source link

Way forward for co-registration module #435

Open rhugonnet opened 1 year ago

rhugonnet commented 1 year ago

We currently have 25 issues open on co-registration :scream_cat:, and a lot of notes and plans for the way forward dispersed everywhere. I initially tried to organize this in a project: https://github.com/orgs/GlacioHack/projects/3/views/2, but I think that the level of detail of the issues is too inconsistent for this, so I'm writing this summary instead!

Architecture: We already did a lot in https://github.com/GlacioHack/xdem/pull/158 and https://github.com/GlacioHack/xdem/pull/329 to re-organize the coding structure to be more consistent and robust, but some points are left:

Features: We've been needing for a while to have consistent:

Most of them are not too far since we introduced a consistent structure for optimizers or binning in https://github.com/GlacioHack/xdem/pull/158. It makes weights almost directly supported (but we'll need to raise proper warnings for methods that currently ignore, such as ICP), and will make plotting more easy by consistently treating the type of methods (binning, fit, or both) and the dimensionality of the variables (1D, 2D, ND), which can be re-used for Tilt or NuthKaab fits in the affine functions.

Tests: Some tests are still a bit old or slow, several related issues could be solved all at once:

Performance: :warning: We really need to think ahead for a structure that will allow memory-efficient computations using Dask:

Bugs: Here there's a lot, but they might solve themselves (or become irrelevant) after changing architecture + tests:

423, #422, #404, #326, #232, #193

Idea of plan moving forward:

  1. Think ahead on logic for Performance: already done a bit in https://github.com/GlacioHack/xdem/issues/428#issuecomment-1707538808. I think we can use the structure proposed there which should work eventually :crossed_fingers:! For apply, this should be adaptable down the line...
  2. Focus first on things all related to Architecture: to avoid the effort of adapting all the rest if we did it first (new features, bug fixes and tests). A lot of lessons learned from https://github.com/GlacioHack/xdem/pull/158 and https://github.com/GlacioHack/xdem/pull/329.
  3. Add basic architectural Tests to ensure 2. is working as intended with what we already have, and add new data for consistent (and quick!) tests (this specific point might require its own thread of discussion).
  4. Add support for all new Features: should be made easier by the consistent architecture!
  5. Add new Tests in parallel of each feature in 4.
  6. See if the Bugs are still relevant, and fix them if they are! :smile:
  7. Celebrate, for we would have reached quite a way! :champagne:

Any thoughts @adehecq @erikmannerfelt? :smile:

rhugonnet commented 1 year ago

To clarify on the Coreg.apply for chunks:

Is that something we can do currently @erikmannerfelt?

adehecq commented 1 year ago

Any thoughts @adehecq @erikmannerfelt? 😄

Wow, I'm a bit overwhelmed with the list of things left to do to have any useful feedback for the moment. It's great you made an exhaustive list of all the things to be considered for coreg. Now we need two things:

What was your idea regarding this project. Did you want to work on it in the coming days/weeks? We should try to set some blocks of time to work together on this. We can coordinate over Slack or face to face.

rhugonnet commented 1 year ago

Yes it is a bit overwhelming like this haha, but thankfully some of these points are fairly small! :slightly_smiling_face:

We'll need more detailed lists, but I think those can live within each PR that will address a new step/feature listed above. The architecture work will have to be one big PR, as everything is inter-dependent on the structure.

With all the discussions and the work already done in #158 trying to make the structure generic and robust, I have a clear vision of the technical steps to go through to reach every point. And I don't want to wait too long and lose it! I'd be happy to write a more detailed plan, discuss it, and combine efforts if you both can block some days for it, but can otherwise push on my own in the frame of the coreg study! :wink:

My idea was to start on this full time end-of-September (for the co-registration study to reach a good stage before AGU).

rhugonnet commented 1 week ago

One year later, and we've addressed most of the points of this (long) issue and the core coregistration module is now close to being finalized! :partying_face: :balloon:

Now that the structure, input support and modular arguments are fully consistent across all coregistration methods, we can more easily address the three remaining aspects (which are either about maintenance or adding new features):

  1. Improve tests further, which is mostly about the variety of test data, and the way it is pre-processed and passed to tests (see list in first post above), which joins #563 started by @adebardo,
  2. Add plots, statistics which joins https://github.com/GlacioHack/geoutils/issues/602 and #569 by @adebardo, and other some small convenience routines such as #437,
  3. Finalize scaling abilities with out-of-memory coregistration greatly moved forward in #525 by @ameliefroessl (which can be implicitly supported once Xarray/GeoPandas accessors are implemented!).

Another point that comes to mind would be to add CoregFilter classes to use more easily in the CoregPipeline, but right now the user can always customize his input inlier_mask separately to remedy that.