ICESAT-2HackWeek / CloudMask

Fetch, classify, and label ICESat-2 data
BSD 2-Clause "Simplified" License
4 stars 6 forks source link

New algorithm for assimilation of datasets on grids #5

Open facusapienza21 opened 3 years ago

facusapienza21 commented 3 years ago

The current assimilation of ICESat-2 with VIIRS is done via the Ball Tree method. The function 'associate' in 'utils_viirs.py' just calls the method Ball Tree on Sklearn:

Screen Shot 2020-09-30 at 12 51 31 AM

In the particular case where (at least) one of the datasets consists on a grid of points (eg, satellite images) the assimilation can be done in a more efficient way, exploiting the structure of the grid.

This algorithm probably is useful in more general application and the method could be an important contribution for other teams to.

facusapienza21 commented 3 years ago

@espg, how much priority do you think we should give to this? Right now, for one single VIIRS image (3200 x 3200) this takes several minutes, but it works. However, I believe this is useful even out of the scope of this project and just writing a more efficient algorithm could be an important contribution.

espg commented 3 years ago

Mixed feelings on this... I think that having a function or method specifically for raster data makes sense. Doing lookups between two raster data sets just doesn't make much sense, when you can do it differently in grid space. I think that the existing method should be kept, since it's good for matching vector to vector points, and can be a fall back if the raster method fails for some reason (i.e., weirdness in calculating grids that cross over the poles). The ballTree method itself can also be speed up using GPU acceleration too, which is probably fast enough to erase any noticeable pause (if the user has a gpu).

I guess I'd say that we should start with making a method for when both datasets are grids to use instead, and then after that's in place, revisit if it should get expanded to point-raster matching... I expect that we'll always use this method or something close to it for matching point-to-point datasets.