British-Oceanographic-Data-Centre / COAsT

A Coastal Ocean Assessment Tool built as an extendable python framework for nemo models
https://british-oceanographic-data-centre.github.io/COAsT/
MIT License
24 stars 11 forks source link

subset_indices() would benefit from requiring the input variable names to be stated #436

Open jpolton opened 2 years ago

jpolton commented 2 years ago

In Gridded.py:

def subset_indices(self, start: tuple, end: tuple) -> tuple:
...

Usage: sci_t.subset_indices([miny, minx], [maxy, maxx]) could be enforced as sci_t.subset_indices(start=[miny, minx], end=[maxy, maxx]) To increase readability (and potential to avoid mix up of inputs).

I recall that there is a setting that can be used that requires the variable names to be stated.

thogar-computer commented 2 years ago

Also need to update all places where method is used and update the documentation as well

lukegorman commented 2 years ago

Rob made an example. @jpolton is this what you want:

class Test:
    def test_method(self, *, arg1: str, arg2: str):
        print(arg1)
        print(arg2)

t1 = Test()
t1.test_method("test1", "test2") # This fails due to unexpected number of arguments.
t1.test_method(arg1 = "test1", arg2 = "test2") # It passes when using named arguments.
roje-bodc commented 2 years ago

Note that this throws an exception if the user tries to call the method without using named arguments. Which doesn't seem like nice user experience.

jpolton commented 2 years ago

Yes that is the right idea. To avoid people getting inputting the variables in the wrong order (e.g. switching latitude and longitude) and getting odd effects. If you are forced to name the variable, then you wouldn't make this mistake.

Actually the next two methods in gridded.py make more sense for this requirement:

    def find_j_i(self, lat: float, lon: float):
        """
        A routine to find the nearest y x coordinates for a given latitude and longitude
        Usage: [y,x] = find_j_i(49, -12)
        :param lat: latitude
        :param lon: longitude
        :return: the y and x coordinates for the NEMO object's grid_ref, i.e. t,u,v,f,w.
        """
        debug(f"Finding j,i for {lat},{lon} from {get_slug(self)}")
        dist2 = np.square(self.dataset.latitude - lat) + np.square(self.dataset.longitude - lon)
        [y, x] = np.unravel_index(dist2.argmin(), dist2.shape)
        return [y, x]

    def find_j_i_domain(self, lat: float, lon: float, dataset_domain: xr.DataArray):
        # TODO add dataset_domain to docstring and remove nonexistent grid_ref
        """
        A routine to find the nearest y x coordinates for a given latitude and longitude
        Usage: [y,x] = find_j_i(49, -12, dataset_domain)
        :param lat: latitude
        :param lon: longitude
        :param grid_ref: the gphi/glam version a user wishes to search over
        :return: the y and x coordinates for the given grid_ref variable within the domain file
        """
        debug(f"Finding j,i domain for {lat},{lon} from {get_slug(self)} using {get_slug(dataset_domain)}")
        internal_lat = dataset_domain[self.grid_vars[1]]  # [f"gphi{self.grid_ref.replace('-grid','')}"]
        internal_lon = dataset_domain[self.grid_vars[0]]  # [f"glam{self.grid_ref.replace('-grid','')}"]
        dist2 = np.square(internal_lat - lat) + np.square(internal_lon - lon)
        [_, y, x] = np.unravel_index(dist2.argmin(), dist2.shape)
        return [y, x]

You can see that if you had to specify which was lat and which was lon you would never get them muddled.

@roje-bodc makes a good point about user experience.

lukegorman commented 2 years ago

@jpolton Please imagine this scenario:

lukegorman commented 2 years ago

Current state:

lukegorman commented 2 years ago

Requested functionality:

jpolton commented 2 years ago

I feel Bob's pain. However, at in reality COAsT is at the pre-viral stage and there are not many candidate Bob's out there.

But the principle is worth practising. And there maybe secret Bob's out there. We can refine this ticket to impose named parameters only on methods where user error is likely to mess stuff up and be hard to detect.

If we can define a work flow for the above examples: find_j_i() and find_j_i_domain() where the risk to reward is worth it, then we can improve coding practise in the future whilst minimising Bob's distress.

@lukegorman Does this sound reasonable?

lukegorman commented 2 years ago

@jpolton yes thats reasonable

roje-bodc commented 2 years ago

@jpolton I have created and linked a pull request to this ticket which has changes for the find_j_i methods, and the subsetting indices method. This will enforce that users use keyword arguments when they call the method. Any existing user code that doesn't use keyword arguments will fail though.