fatiando / harmonica

Forward modeling, inversion, and processing gravity and magnetic data
https://www.fatiando.org/harmonica
BSD 3-Clause "New" or "Revised" License
208 stars 68 forks source link

Better default for window_size in EquivalentSourcesGB #425

Closed leouieda closed 2 months ago

leouieda commented 1 year ago

Description of the desired feature:

The window_size in gradient-boosted equivalent sources currently defaults to 5 km. This would completely break for problems that have very large or very small areas. We used because we needed a default but this is not ideal.

A better default would be to estimate a square window where there will be about 5k data points on average. 5k data can fit on most computers RAM so it seems like a sensible default. Being conservative here means that we won't get memory errors from numpy in the majority of cases. In this case, the default would be window_size=None and in .fit we estimate a default value with:

if self.window_size is None:
    area = (self.region_[1] - self.region_[0]) * (self.region_[3] - self.region_[2])
    ndata = data.size
    points_per_m2 = ndata / area
    window_area = 5e3 / points_per_m2
    self.window_size_ = np.sqrt(window_area)
else:
    self.window_size_ = self.window_size

And we use self.window_size_ internally.

As with #424, I also think this is OK to break compatibility without going through the hassle of warning/deprecation. But will do it if others think it's needed.

Are you willing to help implement and maintain this feature?

Yes, but happy to let others do it since my time is limited.