Revamp and simplify the raster-point/P-R/P-P functionality

erikmannerfelt commented 1 year ago

ICP is inherently a point-to-point registration method. Therefore, by logical extension, it should have a fit_pts_func! Currently it does not, meaning point-to-raster co-registration does not work.

EDIT: The original title and topic of this issue was "ICP doesn't have a fit_pts_func", but it evolved into a more interesting conversation.

rhugonnet commented 1 year ago

I'm wondering: is there any coregistration method that relies on the 2D array shape?

If not, maybe the Coreg.fit_pts() function should NOT be subclassed by a _fit_pts_func, but simply pass two vectors to the _fit_pts method that every method has? This would simplify things a lot. Essentially, fit_pts() would only be a convenience wrapper that handles the different input shape of the data.

erikmannerfelt commented 1 year ago

Hmm, could you please elaborate on that, @rhugonnet? I'm all for simplification, but I'm not sure I understand what you mean.

ICP works by converting rasters to point clouds and then running the main functionality.
NuthKaab currently has TWO almost identical functions for fit and fit_pts. Both work on the same principle but look slightly different.
GradientDescending does not have a Raster-Raster fit function at all at the moment.

So to answer your first question (I'll call a 2D array a raster); some methods require at least one raster to solve the otherwise existing topology problem. Any 2D (or 2.5D if we want to call it that) approach requires that at least one surface is given, or else there is no easy way to compare points. If we want to make all of them fully 3D (for point-point to work), we need to implement a fast and accurate topology approximation. It could be as simple as just doing delunay triangulation of the points, but extreme outliers may be a nuisance. The best is some kind of robust averaging of sorts (but then perhaps as a preprocessing filter). My point is that for all to be fully 3D, complexity can increase.

ICP does not require any raster or topology information; it works entirely on point data
NuthKaab requires one raster in its current implementation as it's reliant on the topology
GradientDescending has identical requirements to NuthKaab, but would be easier to convert to point-point as it only works on dh (not aspect+slope+dh).

One simple (and safe) workaround is to add some default fallback behaviour unless overridden, as suggested below. Honestly, it feels like we should have four different functions, not two which we have at the moment, since there are four potential cases of co-registration:

Raster-raster:

Use a raster-raster function if available
Use a raster-point function if available (by converting one raster to points)
Convert to points and use point-point

Raster-point

Use a raster-point function if available
Fallback to point-point

Point-raster

Use a point-raster function if available
Fallback to raster-point if available and invert the result
Fallback to point-point

Point-point

Use point-point
Fail :(

In this case, the only essential function is point-point. All others can use that as a fallback, so the theoretical smallest size of an approach is tiny. ICP would be the smallest, as it only works with point-point, so all other functions would be fallbacks. GradientDescending would have a Raster-point, would work on raster-raster and point-raster as fallbacks, and (currently) fail on point-point. NuthKaab would have one raster-raster and one point-raster, have raster-point as a fallback, then (currently) fail on point-point.

What are your thoughts on that, @rhugonnet and @adehecq? I could take on the structuring of this. It would mean much less code in the end for each specific co-registration class, and would make new co-registration approaches easier to implement I think, as only one function (point-point) is technically needed.

rhugonnet commented 1 year ago

Very good point(-point)! :p Indeed it would improve things a lot, and to be honest I hadn't fully thought about all of this yet.

I agree with the method structure! :slightly_smiling_face: A couple thoughts:

Not sure we need both "point-raster" and "raster-point", the affine transform should be fully reflective. So we could consistently invert the result by switching the reference, like you said! This would simplify the structure to 3 cases.
This is more a question: I know some ICP flavors are "point-to-point" and others "point-to-plane", e.g. #11 used in ASP (--alignment-method in https://stereopipeline.readthedocs.io/en/latest/tools/pc_align.html). It sounds to me like it only relates to the internal minimization algorithm, no relation to "point-point" and "point-raster" sampling?
I think we need to be careful with point-point support, the tools are really different and PDAL has already so much available. What were your thoughts, wrapping some of the routines there?

In terms of coding structure within coreg, is it the following what you were thinking: A single Coreg.fit() function (we remove fit_pts() and have a single function, as it's fairly easy to detect the input type? this is what I meant in my previous comment) that calls:

Coreg._fit_rst_rst() if two rasters are passed (and function exists in subclass, otherwise fails; same logic below),
Or, alternatively Coreg._fit_rst_pts() if 2a) one raster and one point dataset are passed, or 2b) one raster and the other converted to points if 1. failed,
Or, alternatively Coreg._fit_pts_pts() if 3a) two point datasets are passed, 3b) one raster converted to points and one point if 2. failed, 3c) two rasters converted to points if 1. and 2. failed.
Raises error on input type if all 1., 2. and 3. failed.

It's amazing to see so much activity in xdem again!! Things converge so much better (and faster) when we're all there :wink:

erikmannerfelt commented 1 year ago

Very good point(-point)! :p Indeed it would improve things a lot, and to be honest I hadn't fully thought about all of this yet.

I agree with the method structure! 🙂 A couple thoughts:

* Not sure we need both "point-raster" and "raster-point", the affine transform should be fully reflective. So we could consistently invert the result by switching the reference, like you said! This would simplify the structure to 3 cases.

* This is more a question: I know some ICP flavors are "point-to-point" and others "point-to-plane", e.g. [Test libpointmatcher ICP algorithm #11](https://github.com/GlacioHack/xdem/issues/11) used in ASP (--alignment-method in https://stereopipeline.readthedocs.io/en/latest/tools/pc_align.html). It sounds to me like it only relates to the internal minimization algorithm, no relation to "point-point" and "point-raster" sampling?

* I think we need to be careful with point-point support, the tools are really different and PDAL has already so much available. What were your thoughts, wrapping some of the routines there?

In terms of coding structure within coreg, is it the following what you were thinking: A single Coreg.fit() function (we remove fit_pts() and have a single function, as it's fairly easy to detect the input type? this is what I meant in my previous comment) that calls:

1. `Coreg._fit_rst_rst()` if two rasters are passed (and function exists in subclass, otherwise fails; same logic below),

2. Or, alternatively `Coreg._fit_rst_pts()` if 2a) one raster and one point dataset are passed, or 2b) one raster and the other converted to points **if 1. failed**,

3. Or, alternatively `Coreg._fit_pts_pts()` if 3a) two point datasets are passed, 3b) one raster converted to points and one point **if 2. failed**, 3c) two rasters converted to points **if 1. and 2. failed**.

4. Raises error on input type if **all 1., 2. and 3. failed**.

It's amazing to see so much activity in xdem again!! Things converge so much better (and faster) when we're all there 😉

Thank you for your point-points!

On point-raster and raster-point

I agree that raster-point and point-raster would very rarely both need to be implemented. My only thought was when using for example point-raster NuthKaab. Right now, we derive slope/aspect from the TBA raster, which may or may not be a great idea, depending on the raster. It'd be better to have slope/aspect from the reference, but that is in turn difficult with point data that's missing normals. One solution could be to have slightly different P-R and R-P behaviour:

P-R

Use reference point normals or slope/aspect if available
Fallback to TBA raster slope/aspect

R-P

Use reference raster slope/aspect

Of course, this could also just be solved inside some other function, but I just wanted to explain my reason for where P-R and R-P might be useful.

Common `fit()` method and point clouds

I think you read my mind with the common .fit() method! I started thinking about that as I tried to fall asleep yesterday, and it makes perfect sense. A DataFrame or a numpy array with some shape constraints could be parsed as a point cloud, while RasterTypes and numpy arrays with other constraints are rasters. Then, the appropriate _fit_* function is called, depending on the setup.

We might have to properly define a point cloud shape, however. The shape will be (N, F) where N is the number of points, and F is the number of fields. For a 3D cloud without additional information, it would be (N, 3), but if normals and/or weights are included, it'll be (N, 4), (N, 6) or (N, 7). Then we're almost getting into tiny-raster territory. One solution to divide them could be:

If it's (N, 3), it's a point cloud without extra information
If it's a structured array with "x", "y" and "z" names ( and potentially others), it's a point cloud.
If it's a DataFrame, it's a point cloud
If it's (B, N, X) or (N, >3), where B is parsed as bands, it's a raster

On point-point methods

I realize now that this might not be perfectly in line with xdem's aim, and as you say, there are already methods for it. It was just my thought that we're so close to actually being able to support it (especially with ICP being inherently P-P-based) that we could just as well have that too. Since it wouldn't be too hard to implement this, maybe we could support it for the methods where it's easy (right now only ICP), and then just error on the rest. I thought a bit ahead that your BiasCorr methods and perhaps some filters could be really useful for point data, and there is no other library to my knowledge that leverages that. But I agree, P-P is the lowest priority functionality to make work.

A side note. Point based co-registration and bias correction could potentially be better as any transformation is lossless. For each (horizontal) DEM transformation, we need to interpolate and thus potentially degrade the DEM. This doesn't happen on point clouds apart from float precision loss, so that's an argument to support more point-based data in the future.

Also a small correction to my previous statements. I checked our ICP call and opencv actually accepts normals (which I've implemented a calculation for). I don't know what effect that has on the results, but if it actually improves things, it means ICP is also topology-dependent. Basically, two point clouds where neither has normals could be problematic.

And I agree that it's great to see xdem gaining steam again! I unfortunately get most of my drive from having to fix bugs for a project that uses it haha, but I'll do my best to keep up the momentum even when these things are solved.

rhugonnet commented 1 year ago

Great thoughts! Agree on all :smile: I'll keep you posted on when I start working on coreg/ again (likely most of September), so that we don't branch out too far off at the same time!

GlacioHack / xdem