Open adebardo opened 1 month ago
@adehecq, @rhugonnet Currently, internal work is being done to test the blockwise method (and its current bug), but we would like to add this specific functionality. What do you think? @duboise-cnes :)
This is a great idea! Being only able to specify a "number of blocks" is definitely a limitation. Having the option of specifying a "number of blocks" could be retained just for the specific use case (immediately translating it into a size).
For the subdivision of the array: To avoid edge effects during transformations of the chunks, it is usually necessary to use an overlap. I have some old code that does this here (which does not have proper tests written, but was used a lot and checked qualitatively), if can be useful: https://github.com/iamdonovan/pyddem/blob/main/pyddem/fit_tools.py#L1390 (the splitter
/stitcher
are for subdividing without overlap, and patchify
/unpatchify
with overlap; all work for a 3D data cube, but should also be applicable to 2D).
The problem is slightly different as a classic split/stitch here, as we don't know in advance the horizontal shift that the coregistration will find for each chunk. So we'd need some kind of splitter
(for the fit()
) followed by a patchify
/unpatchify
(for the apply()
) that adjusts the size of the overlap based on the transformation to avoid introducing voids.
(This could be a way of running apply()
that would solve the current bug)
@erikmannerfelt could have more ideas/insights as he implemented the original functionality :slightly_smiling_face:.
After more thinking, I'm wondering if directly using https://docs.dask.org/en/latest/generated/dask.array.overlap.map_overlap.html would make more sense, even if for now our input arrays are not chunked in-memory.
It would:
dem
Xarray accessor,patchify/unpatchify
(as map_overlap
is already extensively tested in Dask),The only downside is that Dask is more complex. For instance, we likely won't be able to use directly map_overlap
because the fit()
function is quite a specific operation, and spits out metadata instead of another array. We'll probably have to map the chunks with overlap in a custom delayed
function. But we should be able to re-use existing code for that, for instance we managed a delayed
chunking with overlap and atypical output in interp_points
here (which is fully tested): https://github.com/GlacioHack/geoutils/blob/e7538d1cd9bec79508fe1452991cbef8c5078bb9/geoutils/raster/delayed.py#L348. And several of us have accumulated enough experience with it to help!
Could be worth it, what do you think? :smile:
After quick reading, I am not sure to understand completely all. Maybe in two phases:
I have discussed with @guillaumeeb from CNES who have more knowledge on dask usage than me. Do you have some insights on this dask impact proposed by @rhugonnet ? A good way to be compatible with dask would benefit the project, i agree.
An additional point I just thought of: If resolving the bug involves removing the warp_dem
function, that's for the best. It is likely responsible for artefacts seen in #584, and it would allow to remove scikit-image
as a dependency altogether (I opened #654).
Agree on the two phases.
My main point was that we maybe shouldn't lose too much time implementing and testing a custom patchify
function examplified in the PR description above, when we can use the one existing in Dask (independently of actually having Dask support in terms of memory usage/parallelization for now).
So in that sense, Dask would also be slightly part of the first phase but simply as a replacement of the patchify
function (which also runs on normal arrays).
Then, in a second phase, we could refine and test out-of-memory behaviour, parallelization, and other Dask parameters (for which we would already have the right code structure).
Context
The CNES QI team is interested in the possibility of performing block-based coregistration, which is why studies are currently being conducted to test this functionality, referred to as blockwise. At the moment, users must specify the number of subdivisions they wish to apply. However, the QI team would prefer to specify block sizes instead. The goal of this ticket is to propose an implementation of this capability.
We will start the process of managing the size of an overlap between tiles. In this ticket, it will have a default value, and another ticket will be opened later to better manage this overlap.
Proposed Code
Currently, subdivision is handled by the
subdivide_array
function, so we propose managing block sizes at this level.Extend Subdivision: Currently, users input a number, but instead, we could allow them to input a dictionary or list containing two integers for rows and columns, and then calculate the number of subdivisions based on these values.
For example, for an image of size 985x1332:
Update the Following:
[ ] Update https://github.com/GlacioHack/xdem/blob/main/xdem/coreg/base.py#L3197 so that, based on row and column information, the function calculates the number of subdivisions when the user provides a list
[row, col]
: there is an exemple[ ] Update the associated docstring.
[ ] Copy a function that allow blocks to overlap
Tests
patchify
functionDocumentation
/ estimate 5d