SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
501 stars 185 forks source link

Best way to handle `generate.py` mutable default kwargs. #3226

Open JoeZiminski opened 2 months ago

JoeZiminski commented 2 months ago

Further to the discussion in #2342, I believe we are aiming to remove mutable default arguments where possible. One place I'd like to handle these is in the generateion modules, for example generate_ground_truth_recording() and generate_drifting_recording(). Before opening a PR I thought I'd open an issue to discuss the best approach as it's not the easiest problem to solve neatly.

I believe in general, the easiest way to handle mutable defaults is to take None as default argument, and have

if some_kwargs is None:
    some_kwargs = dict(default_kwargs...)

Where these if ... checks are always abstracted to a new function. The reason to extract to a new function is that it is also useful for changing only one for the functions default kwargs (more on this below). At the moment, it is not entirely clear what the expected behaviour when passing a subset of the mutable defaults.

For example, say the call chain is: generate_ground_truth_recording -> generate_sorting where generate_ground_truth_recording has a default kwargs dict generate_sorting_kwargs for generate_sorting, and generate_sorting has its own default arguments which may not match generate_ground_truth_recording's generate_sorting_kwargs When passing only a subset of generate_sorting_kwargs, they will be passed directly to generate_sorting and the default arguments of generate_sorting will be used for the non-passed parameters. I half-expected that instead, the generate_sorting_kwargs set on generate_ground_truth_recording would be updated with my subset.

I think in this case it is better that the existing behaviour is maintained, i,e. whatever is passed to generate_sorting_kwargs will be passed directly to generate_sorting. However, this is a pain if you want to change one generate_sorting_kwargs but keep the others. You have to copy the entire dictionary to your code and then change one entry. In this case, having a function discussed above is very handy, for example:

def get_generate_ground_truth_recording_defaults(
    generate_probe_kwargs=None,  
    generate_sorting_kwargs=None, 
    noise_kwargs=None, 
    generate_unit_locations_kwargs=None
):

   if generate_probe_kwargs is None:
       generate_probe_kwargs  = ...

   ...

   return generate_probe_kwargs , ...

This both neatens up the function body for generate_ground_truth_recording but also can be called as the main pattern to selectively update some kwargs when calling generate_ground_truth_recording e.g.

generate_probe_kwargs, _, _, ... = get_generate_ground_truth_recording_defaults() 
generate_probe_kwargs["some_key"] = some_value

generate_ground_truth_recording(generate_probe_kwargs=generate_probe_kwargs, ...)

So the generate pattern for dealing with mutable defaults (basically dicts, as lists can be turned to tuples):

1) set all dict defaults to None 2) create a defaults function which, that if these kwargs are None, returns the default for the function (otherwise, passes them back unchanged) 3) call this defaults function in the main function body to process kwargs neatly. This function can also be used to conveniently selecftively update default kwargs.

I wonder what people think about this?

JoeZiminski commented 2 months ago

Some notes from a discussion with @samuelgarcia:

1) For drifting_generation.py, which are particularly complex, we can make presets, similar to correct_motion. 2) list -> tuple for default arguments is not a problem, just need to check any isinstance(x, list) downstream. 3) For other cases where dict is a default mutable but are not as extensive as for example drifting_generation.py, and where the kwargs are only used for a single function signature, a copy of kwargs could be made at the top of the function body to address pythons strange behaviour with mutable.

JoeZiminski commented 3 weeks ago

@samuelgarcia just to continue this discussion, I think the vote went to the copy approach, so in this example we can make the engine_kwargs=None to engine_kwargs=__default_engine_kwargs and add a copy instead of

if engine_kwargs is None:
    engine_kwargs = dict()

The only remaining point is how to structure these default configs. I think we talked about this in terms of the preset idea, it seems more flexible to have these always above the function as in this example, instead of the signature, so they can be used elsewhere without having to copy and paste the entire dict to change one value. WDYT?