Re-structure coregistration affine methods, point-raster pre-processing, binning and fitting, and add pixel interpretation support

rhugonnet commented 4 months ago

This PR restructures affine coregistration methods and many generic subfunctions to be consistent and modular. Primarily, it makes 1/ "point-raster" subsampling, 2/ any binning and/or fitting, and 3/ iteration over methods, the same across all Affine and BiasCorr methods with extensive modularity. Additionally, this PR updates horizontal shift reprojection from using a different implementation in every method (e.g. Nuth and Kääb used scipy.RectBivariateSpline) to using consistently interp from GeoUtils which supports any resampling method (for which details can be chosen in global config parameters), with consistent nodata propagation built on top, and preserves computational efficiency by allowing to return a single interpolator object re-used (https://github.com/GlacioHack/geoutils/pull/560). This also allows to avoid sub-pixel errors of Rasterio reprojection. It also re-structures the code of affine methods, defining core functions outside the classes' _fit_rst_rst or _fit_rst_pts, which allows to have consistent pre-processing steps, and clear input/outputs. Finally, this PR adds support for pixel interpretation for coregistration methods (important for point-raster operations).

Resolves #234 Resolves #574 Resolves #561 Resolves #546 Resolves #573 Resolves #571 Resolves #547 Resolves #327 (except plot but those are mentioned in https://github.com/GlacioHack/xdem/issues/139) Big step forward for #435

Opens #572 Opens #575 Opens #579

Summary of changes

1/ New core processing functions

For consistent binning/fitting: Restructures the coreg/affine module to support the fit_or_bin logic previously added in coreg/biascorr: However, as in the case of coregistration, the fit function is fixed (that of the method; for instance dh/slopetan = aspect for Nuth and Kääb), the "bin" option alone is impossible and only "fit" or "bin_and_fit" are supported. Therefore, the mirrored structure simply allows to pass a boolean argument bin_before_fit to perform binning before fitting, as well as any fit optimizer, bin statistic, and bin sizes. To do this, this PR modifies the bin and fit of BiasCorr into a single _bin_or_and_fit_nd function in coreg.base used by all Affine and BiasCorr methods.

For consistent point-raster/raster-raster subsampling: This pre-processing step is different from that running directly in Coreg.fit(), which simply extracts the transform, crs, and others. This step revolves around subsampling the point-raster (raster-raster, or point-raster) at the same points for a valid subsample (no NaNs) of a certain size. This cannot be done consistently in fit() before _fit_xxx, because auxiliary variables might need to be derived by the method on the full arrays (such as slope, aspect, etc), which propagate NaNs. In particular, the tricky aspect is the case of point-raster, where we need to first interpolate the raster at the coordinates of the points to get an idea of the available valid subsample (depending here also on NaN propagation). To do this, this PR modifies the subsampling that existed in BiasCorr into a single _preprocess_pts_rst_subsample function in coreg.base relying on two steps: _get_subsample_mask_pts_rst and _subsample_on_mask. In order to allow to return a dh interpolator that can be re-used iteratively for some affine methods, another function _preprocess_pts_rst_subsample_with_dhinterpolator in coreg.affine relies on a different second step _subsample_on_mask_with_dhinterpolator.

For consistent iterative methods: Methods such as ICP, Nuth and Kääb, etc, all work based on iterations. To have consistent treatment of inputs/outputs and iterative arguments (tolerance to reach, maximum number of iterations), those are now grouped through an _iterate_method function.

Ensuring consistent argument across these core functions and other parameters: To ensure the argument passed to these three core functions (subsampling, binning/fitting, and iteration) are consistent, new dictionary types are defined for each which defines their key names and value types: InFitOrBinDict ("In" for input), InRandomDict and InIterativeDict. Additionally InAffineDict for affine-related parameters, and InSpecificDict for any method-specific parameters that does not fit other categories. Same is done for outputs: OutRandomDict, etc. See #573 for details on the new organization.

2/ Robust and consistent horizontal shift reprojection

The core of fit_ functions of horizontal methods such as NuthKaab and GradientDescending has been modify to rely on the _preprocess_pts_rst_subsample_with_dhinterpolator mentioned above. This function relies on geoutils.interp_points (directly for point-raster, and through a new function _reproject_horizontal_shift_samecrs for raster-raster) to perform a consistent 2D grid interpolation with nodata propagation supporting any SciPy method, see details here: https://github.com/GlacioHack/geoutils/pull/560. This PR adds a test checking that the result of this functionality matches GDAL reprojection, which ensures Rasterio sub-pixel reprojection errors are gone.

3/ Re-structuration of affine classes

All code from affine classes is moved out of the _fit_xxx functions into functions that rely on the same steps where possible. For instance, the NuthKaab method is divided into a _nuthkaab_fit_func to define the function to optimize, a _nuthkaab_fit which wraps _bin_or_and_fit_nd with some initial guess, a _nuthkaab_aux_vars that derives auxiliary variables required for the fit (slope, aspect), and a _nuthkaab_iteration_step that is passed to _iterate_method.

All the _apply_xxx affine functions are removed to avoid unnecessary redundancies/inconsistencies, as the apply_matrix is the same across all affine methods, and since #531 supports resample=False that modifies only the geotransform for horizontal shifts.

4/ Add support for pixel interpretation

Dealing with area_or_point was previously done arbitrarily in each subclass. Using the above interpolation/reprojection (through geoutils.interp_points), we can now simply pass the argument to GeoUtils who accounts for it when comparing point-raster, or raises a warning for raster-raster having a different interpretation. It is only required for the fit part of coregistration, and thus added to all related functions.

5/ Removal of `Tilt`, and other subfunctions

With the above changes, the affine Tilt function corresponds exactly to Deramp(order=1) (instead of having its own, separate implementation). Additionally, it is not truly affine (cannot be modelled by an affine matrix), and thus has no place in affine methods. We can simply remove, and users can rely on Deramp instead. All sub-functions that were created to perform some processing steps in coregistration modules (many times duplicated from another method, or from a GeoUtils functionality) for a specific affine method are now removed. This is largely thanks to https://github.com/GlacioHack/geoutils/pull/560 which re-structures GeoUtils raster functionalities to be called without having to build a Raster object (simply passing array, transform, CRS, etc). Until now, we were deconstructing/reconstructing Raster objects everywhere to call simple functions (interp_points, coords, etc), which made us lose performance. This is now largely solved too! A couple exceptions remains: warp_dem used only by BlockwiseCoreg and _mask_as_array used only by the dDEM class, but their use seems more specific and I'm not sure it is covered by anything in GeoUtils right now.

Tests

I did not write new tests on purpose in this PR to ensure I could check that all these changes do not affect our current methods significantly. Improving tests (in particular for affine methods) will be addressed in a next PR.

And indeed, all tests are passing with minor modifications! :partying_face:

Additionally, I increased the speed of the tests following #574 (ICP tests were extremely slow and cumulated to 5min+, or half of xDEM's test length). Now nearly all the tests of test_coreg/ run on a cropped version of the reference/tba example DEMs. And all of xDEM's test run in 5min instead of 12min+. :smiley:

To-do-list

[x] Fix resolution error in NuthKaab of #546,
[x] Finalize https://github.com/GlacioHack/geoutils/pull/560,
[x] Finalize same-CRS SciPy interpolator reprojection to avoid Rasterio sub-pixel errors,
[x] Move subsampling out of the "bin_and_fit" functionality,
[x] Make "bin_and_or_fit" a generic function applicable to other functions (not only BiasCorr subclasses),
[x] Split preprocessing into consistent functions: subsampling supporting raster-raster or point-raster accounting for invalid data in other variables (bias vars, slope, etc), interpolators supporting raster-raster and point-raster inputs,
[x] Add a generic iterator function for iterative methods (Nuth and Kääb, etc),
[x] Migrate Affine functions out of the classes, and adapt them to work with the new generic core functions,
[x] Add is_translation property to trigger error for unsupported resample=False,
[x] Consistently pass "area_or_point" to Coreg subfunctions,
[x] Finalize mirroring BiasCorr parameters for modular bin and fit,
[x] Add nested, typed dictionary structure to organize coregistration parameters and ensure error-free implementations.
[x] Add Coreg.info() to summarize coregistration information, inputs and outputs.
[x] Save final subsample size for affine classes (pass through preproc function),
[x] Use _check_bin_or_fit function for affine classes (opened #575).

Will do in separate PRs

Add robust tests as in BiasCorr, see list in #485,
Document in #502.

rhugonnet commented 1 month ago

@adehecq @erikmannerfelt This PR is ready for your review! All tests passing locally with the GeoUtils branch https://github.com/GlacioHack/geoutils/pull/560. :partying_face:

I have two main aspects on which I'm hesitating on how to move forward, and would especially like your opinion on:

Currently I defined a nuth_kaab main function called by NuthKaab._fit_xxx directly (and this for all methods), which is itself made of some of the new core subfunctions (listed in the main description, point 1/) and a method-specific one. This ensure things are consistent, but is annoying to save variables to the Coreg metadata that are only available at a given step. For instance, for an easy one: What is the final subsample size on valid values? Not available right now. A harder one: What is the binning/parameters at a given iteration step? (might not be necessary, but could become a need in the future for plotting or other) I see two solutions here to make that easier: a) Either we move the content of nuth_kaab to NuthKaab ._fit_xxx, and wrap the subfunctions called in nuth_kaab as methods of Coreg which always pass/save the metadata consistently when they run (you can see I already did this with _bin_or_and_fit_nd for BiasCorr, it is duplicated as a Coreg method in order to pass/write the metadata). It would make it impossible to call the same nuth_kaab separately than through the Coreg class (but I'm not sure that's a need?), but make those new class methods consistent. b) Or we define new function inputs/outputs (dictionaries?) for each of these subfunctions, to pass metadata of interest through the functions back to the class (could become a hassle very fast).
The CoregDict is getting busy and all over the place. It seems like it could become a combination of the new dicts created in this PR: RandomDict, FitOrBinDict, IterativeDict (and maybe an output dict on the results?)... How should we structure this? Having a nested dictionary would make it well organized, but maybe harder for users to access a specific key easily, even though it's all there. If we did this, we could add an info() method that prints the metadata nicely to partially remedy that. Any other ideas?

rhugonnet commented 1 month ago

@adehecq @erikmannerfelt After quite a bit of thinking, I've addressed the above points to finalize this PR completely.

For 2: I implemented the structure proposed in #573, it is quite satisfactory to extract/write back argument to the Coreg class. See details in that issue. But the nested dictionary are a tiny bit long, I propose an easy solution further below when discussing the renaming of Coreg.meta.

For 1: I left the new structure as is. For affine functions, it now passes the subsample_final from the functions back to the class. We can discuss in more details how to further refine this structure when we need to add stats/plots. It shouldn't be much work in any case now that the base functions are consistent. It would be simply copy/pasting in a subclass method if we chose the class way, or simply adding some more function outputs if we stayed fully function-based. In the second case, we could simply add outputs grouped into the typed dictionary defined for "outputs" in point 2 above, which would keep things nicely organized, checked, and wouldn't be too much effort (which is also why I decided not to change the current structure).

The last big question is: what name should we use for the Coreg dictionary (currently meta)? See #547 for previous discussions about renaming it. My latest idea would be to have 2 dictionaries, Coreg.inputs and Coreg.outputs! This would reduce one nested layer in the current dictionary Coreg.meta which splits into inputs and outputs in the structure introduced in this PR, which would more practical to access keys. And I think they'd be the clearest names for both developers and users! :smile:

Ready to merge! Need a review ASAP to use this to finalize the documentation of #502 :wink:

adehecq commented 1 month ago

Trying to quickly react to your questions before the end of the day:

regarding 2 (nested dictionary), I like that all input and output parameters are saved and easy to access. I'm fine with Coreg.inputs and Coreg.outputs as these are clear (although a little bit confusing with the input and output DEM but that's acceptable I think).
regarding point 1... I didn't manage to understand sorry 😅

adehecq commented 4 weeks ago

a) Either we move the content of nuth_kaab to NuthKaab ._fit_xxx, and wrap the subfunctions called in nuth_kaab as methods of Coreg which always pass/save the metadata consistently when they run (you can see I already did this with _bin_or_and_fit_nd for BiasCorr, it is duplicated as a Coreg method in order to pass/write the metadata). It would make it impossible to call the same nuth_kaab separately than through the Coreg class (but I'm not sure that's a need?), but make those new class methods consistent.

And I don't know if this point is still needed, but now that I get a better sense of the question, I would be in favor of option a I think, i.e. the metadata dictionary gets updated at each iteration if needed.

rhugonnet commented 4 weeks ago

Agreed on all. Except maybe for "it is certainly more difficult to understand when reading the code, maintain and debug.". I think in time (getting used to the functions) and maybe with a Wiki page, it will be much easier to maintain and debug a couple consistent core functions, rather than each Coreg subclass doing its own thing :wink:.

For computation times: Similar for Raster-Raster, and faster for Point-Raster (due to avoiding rebuilding a Raster to call interp_points, now this is done from the array/transform directly :smile:).

Good idea, I will try to write a short Wiki page why this is fresh!

rhugonnet commented 4 weeks ago

Merging like this! Want to know @erikmannerfelt's opinion before changing Coreg.meta into Coreg.inputs and Coreg.outputs. I actually don't dislike meta too much (data about other data: the coregistration algorithm in this case), as long as it's organized inside.

rhugonnet commented 3 weeks ago

FYI: I've added a Wiki page here on the Coreg structure and steps to implement a new method: https://github.com/GlacioHack/xdem/wiki/Things-to-know-on-coregistration-class-structure

adehecq commented 3 weeks ago

FYI: I've added a Wiki page here on the Coreg structure and steps to implement a new method: https://github.com/GlacioHack/xdem/wiki/Things-to-know-on-coregistration-class-structure

Perfect, thanks a lot ! That's super useful !

GlacioHack / xdem