fatiando / harmonica

Forward modeling, inversion, and processing gravity and magnetic data
https://www.fatiando.org/harmonica
BSD 3-Clause "New" or "Revised" License
205 stars 66 forks source link

Change default for window size in EquivalentSourcesGB #487

Closed indiauppal closed 1 month ago

indiauppal commented 3 months ago

Change the default value for the window_size in the EquivalentSourcesGB constructor to "default": window size will be estimated to locate approximately 5000 data points in each window. If window_size="default" and less than 5000 data points are used to fit the sources, a single window for all data and sources will be used. Check if the passed value of window_size is valid. Add a new window_size_ attribute to the class that is set after fitting the sources. Update docstrings and extend tests.

Relevant issues/PRs: Fixes #425

indiauppal commented 3 months ago
indiauppal commented 2 months ago

Completed with @santisoler

santisoler commented 2 months ago

This is starting to look great @indiauppal!

I'm leaving a few ideas after the meeting we had today:

  1. We should probably extend the test function for less than 5000 points and test if the outputs match the ones for a single window. Something like:
    data_windows, source_windows = eqs._create_windows(grid_coords)
    # The output lists should have a single element each (corresponding to the single window)
    assert len(data_windows) == 1
    assert len(source_windows) == 1
    # Check if all sources and data points are inside the window
    for coord in grid_coords:
        npt.assert_allclose(coord, coord[data_windows[0]])
    for coord in eqs.points_:
        npt.assert_allclose(coord, coord[source_windows[0]])
  2. We could explore replacing those np.aranges with slices. We only need those indices to slice the sources and coordinates arrays in the self._gradient_boosting method, so anything we could use to slice them that are more memory efficient should work. This is just a minor optimization, so it's not that critical for this PR.
  3. Since now we have another attribute the fit method assigns (the window_size_), we should add it to the list of Attributes in the class docstring. Something like the think I wrote in #491 for the depth_ argument: https://github.com/fatiando/harmonica/blob/55309c781e0059f78ced658c80d049ea222f31e4/harmonica/_equivalent_sources/cartesian.py#L142-L145
  4. It would be nice to add a check for the window_size argument in the __init__ method. Basically, we should raise an error if the passed value is a str and is not "default". That's the only string value that should be valid. Check this out for inspiration: https://github.com/fatiando/harmonica/blob/55309c781e0059f78ced658c80d049ea222f31e4/harmonica/_equivalent_sources/cartesian.py#L176-L180

This is somewhat my personal wishlist for this PR, so feel free to assign me a few of these tasks if you want. As always, feel free to ask for help if you need it 🙂

Looking forward to see this merged!

santisoler commented 2 months ago

@indiauppal, I'm updating this branch after the fix I made for the failing Mac testing. Remember to run a git pull to sync your local repo with the latest change in this branch.