Closed rhugonnet closed 4 weeks ago
@adehecq @erikmannerfelt This PR is ready for your review! All tests passing locally with the GeoUtils branch https://github.com/GlacioHack/geoutils/pull/560. :partying_face:
I have two main aspects on which I'm hesitating on how to move forward, and would especially like your opinion on:
Currently I defined a nuth_kaab
main function called by NuthKaab._fit_xxx
directly (and this for all methods), which is itself made of some of the new core subfunctions (listed in the main description, point 1/) and a method-specific one. This ensure things are consistent, but is annoying to save variables to the Coreg
metadata that are only available at a given step. For instance, for an easy one: What is the final subsample size on valid values? Not available right now. A harder one: What is the binning/parameters at a given iteration step? (might not be necessary, but could become a need in the future for plotting or other)
I see two solutions here to make that easier:
a) Either we move the content of nuth_kaab
to NuthKaab ._fit_xxx
, and wrap the subfunctions called in nuth_kaab
as methods of Coreg
which always pass/save the metadata consistently when they run (you can see I already did this with _bin_or_and_fit_nd
for BiasCorr
, it is duplicated as a Coreg
method in order to pass/write the metadata). It would make it impossible to call the same nuth_kaab
separately than through the Coreg
class (but I'm not sure that's a need?), but make those new class methods consistent.
b) Or we define new function inputs/outputs (dictionaries?) for each of these subfunctions, to pass metadata of interest through the functions back to the class (could become a hassle very fast).
The CoregDict
is getting busy and all over the place. It seems like it could become a combination of the new dicts created in this PR: RandomDict
, FitOrBinDict
, IterativeDict
(and maybe an output dict on the results?)...
How should we structure this? Having a nested dictionary would make it well organized, but maybe harder for users to access a specific key easily, even though it's all there. If we did this, we could add an info()
method that prints the metadata nicely to partially remedy that. Any other ideas?
@adehecq @erikmannerfelt After quite a bit of thinking, I've addressed the above points to finalize this PR completely.
For 2: I implemented the structure proposed in #573, it is quite satisfactory to extract/write back argument to the Coreg class. See details in that issue. But the nested dictionary are a tiny bit long, I propose an easy solution further below when discussing the renaming of Coreg.meta
.
For 1: I left the new structure as is. For affine functions, it now passes the subsample_final
from the functions back to the class. We can discuss in more details how to further refine this structure when we need to add stats/plots. It shouldn't be much work in any case now that the base functions are consistent. It would be simply copy/pasting in a subclass method if we chose the class way, or simply adding some more function outputs if we stayed fully function-based. In the second case, we could simply add outputs grouped into the typed dictionary defined for "outputs" in point 2 above, which would keep things nicely organized, checked, and wouldn't be too much effort (which is also why I decided not to change the current structure).
The last big question is: what name should we use for the Coreg dictionary (currently meta
)? See #547 for previous discussions about renaming it.
My latest idea would be to have 2 dictionaries, Coreg.inputs
and Coreg.outputs
! This would reduce one nested layer in the current dictionary Coreg.meta
which splits into inputs
and outputs
in the structure introduced in this PR, which would more practical to access keys. And I think they'd be the clearest names for both developers and users! :smile:
Ready to merge! Need a review ASAP to use this to finalize the documentation of #502 :wink:
Trying to quickly react to your questions before the end of the day:
Coreg.inputs
and Coreg.outputs
as these are clear (although a little bit confusing with the input and output DEM but that's acceptable I think).
- a) Either we move the content of
nuth_kaab
toNuthKaab ._fit_xxx
, and wrap the subfunctions called innuth_kaab
as methods ofCoreg
which always pass/save the metadata consistently when they run (you can see I already did this with_bin_or_and_fit_nd
forBiasCorr
, it is duplicated as aCoreg
method in order to pass/write the metadata). It would make it impossible to call the samenuth_kaab
separately than through theCoreg
class (but I'm not sure that's a need?), but make those new class methods consistent.
And I don't know if this point is still needed, but now that I get a better sense of the question, I would be in favor of option a I think, i.e. the metadata dictionary gets updated at each iteration if needed.
Agreed on all.
Except maybe for "it is certainly more difficult to understand when reading the code, maintain and debug.". I think in time (getting used to the functions) and maybe with a Wiki page, it will be much easier to maintain and debug a couple consistent core functions, rather than each Coreg
subclass doing its own thing :wink:.
For computation times: Similar for Raster-Raster, and faster for Point-Raster (due to avoiding rebuilding a Raster to call interp_points
, now this is done from the array/transform directly :smile:).
Good idea, I will try to write a short Wiki page why this is fresh!
Merging like this!
Want to know @erikmannerfelt's opinion before changing Coreg.meta
into Coreg.inputs
and Coreg.outputs
. I actually don't dislike meta
too much (data about other data: the coregistration algorithm in this case), as long as it's organized inside.
FYI: I've added a Wiki page here on the Coreg
structure and steps to implement a new method: https://github.com/GlacioHack/xdem/wiki/Things-to-know-on-coregistration-class-structure
FYI: I've added a Wiki page here on the
Coreg
structure and steps to implement a new method: https://github.com/GlacioHack/xdem/wiki/Things-to-know-on-coregistration-class-structure
Perfect, thanks a lot ! That's super useful !
This PR restructures affine coregistration methods and many generic subfunctions to be consistent and modular. Primarily, it makes 1/ "point-raster" subsampling, 2/ any binning and/or fitting, and 3/ iteration over methods, the same across all
Affine
andBiasCorr
methods with extensive modularity. Additionally, this PR updates horizontal shift reprojection from using a different implementation in every method (e.g. Nuth and Kääb usedscipy.RectBivariateSpline
) to using consistentlyinterp
from GeoUtils which supports any resampling method (for which details can be chosen in globalconfig
parameters), with consistent nodata propagation built on top, and preserves computational efficiency by allowing to return a single interpolator object re-used (https://github.com/GlacioHack/geoutils/pull/560). This also allows to avoid sub-pixel errors of Rasterio reprojection. It also re-structures the code of affine methods, defining core functions outside the classes'_fit_rst_rst
or_fit_rst_pts
, which allows to have consistent pre-processing steps, and clear input/outputs. Finally, this PR adds support for pixel interpretation for coregistration methods (important for point-raster operations).Resolves #234 Resolves #574 Resolves #561 Resolves #546 Resolves #573 Resolves #571 Resolves #547 Resolves #327 (except plot but those are mentioned in https://github.com/GlacioHack/xdem/issues/139) Big step forward for #435
Opens #572 Opens #575 Opens #579
Summary of changes
1/ New core processing functions
For consistent binning/fitting: Restructures the
coreg/affine
module to support thefit_or_bin
logic previously added incoreg/biascorr
: However, as in the case of coregistration, the fit function is fixed (that of the method; for instance dh/slopetan = aspect for Nuth and Kääb), the "bin" option alone is impossible and only "fit" or "bin_and_fit" are supported. Therefore, the mirrored structure simply allows to pass a boolean argumentbin_before_fit
to perform binning before fitting, as well as any fit optimizer, bin statistic, and bin sizes. To do this, this PR modifies the bin and fit ofBiasCorr
into a single_bin_or_and_fit_nd
function incoreg.base
used by allAffine
andBiasCorr
methods.For consistent point-raster/raster-raster subsampling: This pre-processing step is different from that running directly in
Coreg.fit()
, which simply extracts thetransform
,crs
, and others. This step revolves around subsampling the point-raster (raster-raster, or point-raster) at the same points for a valid subsample (no NaNs) of a certain size. This cannot be done consistently infit()
before_fit_xxx
, because auxiliary variables might need to be derived by the method on the full arrays (such as slope, aspect, etc), which propagate NaNs. In particular, the tricky aspect is the case of point-raster, where we need to first interpolate the raster at the coordinates of the points to get an idea of the available valid subsample (depending here also on NaN propagation). To do this, this PR modifies the subsampling that existed inBiasCorr
into a single_preprocess_pts_rst_subsample
function incoreg.base
relying on two steps:_get_subsample_mask_pts_rst
and_subsample_on_mask
. In order to allow to return a dh interpolator that can be re-used iteratively for some affine methods, another function_preprocess_pts_rst_subsample_with_dhinterpolator
incoreg.affine
relies on a different second step_subsample_on_mask_with_dhinterpolator
.For consistent iterative methods: Methods such as ICP, Nuth and Kääb, etc, all work based on iterations. To have consistent treatment of inputs/outputs and iterative arguments (tolerance to reach, maximum number of iterations), those are now grouped through an
_iterate_method
function.Ensuring consistent argument across these core functions and other parameters: To ensure the argument passed to these three core functions (subsampling, binning/fitting, and iteration) are consistent, new dictionary types are defined for each which defines their key names and value types:
InFitOrBinDict
("In" for input),InRandomDict
andInIterativeDict
. AdditionallyInAffineDict
for affine-related parameters, andInSpecificDict
for any method-specific parameters that does not fit other categories. Same is done for outputs:OutRandomDict
, etc. See #573 for details on the new organization.2/ Robust and consistent horizontal shift reprojection
The core of
fit_
functions of horizontal methods such asNuthKaab
andGradientDescending
has been modify to rely on the_preprocess_pts_rst_subsample_with_dhinterpolator
mentioned above. This function relies ongeoutils.interp_points
(directly for point-raster, and through a new function_reproject_horizontal_shift_samecrs
for raster-raster) to perform a consistent 2D grid interpolation with nodata propagation supporting any SciPy method, see details here: https://github.com/GlacioHack/geoutils/pull/560. This PR adds a test checking that the result of this functionality matches GDAL reprojection, which ensures Rasterio sub-pixel reprojection errors are gone.3/ Re-structuration of affine classes
All code from affine classes is moved out of the
_fit_xxx
functions into functions that rely on the same steps where possible. For instance, theNuthKaab
method is divided into a_nuthkaab_fit_func
to define the function to optimize, a_nuthkaab_fit
which wraps_bin_or_and_fit_nd
with some initial guess, a_nuthkaab_aux_vars
that derives auxiliary variables required for the fit (slope, aspect), and a_nuthkaab_iteration_step
that is passed to_iterate_method
.All the
_apply_xxx
affine functions are removed to avoid unnecessary redundancies/inconsistencies, as theapply_matrix
is the same across all affine methods, and since #531 supportsresample=False
that modifies only the geotransform for horizontal shifts.4/ Add support for pixel interpretation
Dealing with
area_or_point
was previously done arbitrarily in each subclass. Using the above interpolation/reprojection (throughgeoutils.interp_points
), we can now simply pass the argument to GeoUtils who accounts for it when comparing point-raster, or raises a warning for raster-raster having a different interpretation. It is only required for thefit
part of coregistration, and thus added to all related functions.5/ Removal of
Tilt
, and other subfunctionsWith the above changes, the affine
Tilt
function corresponds exactly toDeramp(order=1)
(instead of having its own, separate implementation). Additionally, it is not truly affine (cannot be modelled by an affine matrix), and thus has no place in affine methods. We can simply remove, and users can rely onDeramp
instead. All sub-functions that were created to perform some processing steps in coregistration modules (many times duplicated from another method, or from a GeoUtils functionality) for a specific affine method are now removed. This is largely thanks to https://github.com/GlacioHack/geoutils/pull/560 which re-structures GeoUtils raster functionalities to be called without having to build a Raster object (simply passing array, transform, CRS, etc). Until now, we were deconstructing/reconstructing Raster objects everywhere to call simple functions (interp_points
,coords
, etc), which made us lose performance. This is now largely solved too! A couple exceptions remains:warp_dem
used only byBlockwiseCoreg
and_mask_as_array
used only by thedDEM
class, but their use seems more specific and I'm not sure it is covered by anything in GeoUtils right now.Tests
I did not write new tests on purpose in this PR to ensure I could check that all these changes do not affect our current methods significantly. Improving tests (in particular for affine methods) will be addressed in a next PR.
And indeed, all tests are passing with minor modifications! :partying_face:
Additionally, I increased the speed of the tests following #574 (ICP tests were extremely slow and cumulated to 5min+, or half of xDEM's test length). Now nearly all the tests of
test_coreg/
run on a cropped version of the reference/tba example DEMs. And all of xDEM's test run in 5min instead of 12min+. :smiley:To-do-list
NuthKaab
of #546,BiasCorr
subclasses),Affine
functions out of the classes, and adapt them to work with the new generic core functions,is_translation
property to trigger error for unsupportedresample=False
,Coreg.info()
to summarize coregistration information, inputs and outputs._check_bin_or_fit
function for affine classes (opened #575).Will do in separate PRs