flatironinstitute / CaImAn

Computational toolbox for large scale Calcium Imaging Analysis, including movie handling, motion correction, source extraction, spike deconvolution and result visualization.
https://caiman.readthedocs.io
GNU General Public License v2.0
640 stars 370 forks source link

WIP: CNMFParams overhaul #1239

Closed pgunn closed 9 months ago

pgunn commented 1 year ago

This is a breaking change that fixes crucial bugs with CNMFParams, and it also adds json loading/saving functions for the same.

1) CNMFParams stores a lot of params with the same names (and even one parameter shares a name with a top level dictionary in the object), causing really weird bugs (including #1153 ). The parser in the codebase (before this diff lands) didn't treat subkeys as being pathed, so whenever a new params dictionary was passed in (including during initialisation) it would be copied to every subkey with its name, not allowing them to be independently set and sometimes causing the code to crash when a bizarre value landed over what was supposed to be a simple scalar. This entirely reworks the change_params() method to instead take a dict with paths inside it that matches the internal structure of CNMFParams 2) Tests are updated to match 3) It adds json functions to CNMFParams to make it possible to provide JSON files as canned configurations (intended for demos) 4) The to_dict() method now uses the same names for its export as the internal data structures in CNMFParams, making it possible to save and then load a set of parameters and have it actually work (this did not work correctly before) 5) change_params() no longer returns self (this was not used by any code I could find)

Before this lands: A) All the demos need to be adjusted too B) We need to do some outreach because this is a breaking change that will be painful for users C) I need to look into whether the set() method needs similar adjustments D) Consider making change_params() and set() if we change it validate its inputs and complain if it is passed parameters that it doesn't know how to set (right now these are just ignored). Ideally gate this behind a validate:bool argument or something like that

EricThomson commented 12 months ago

It seems like a good idea generally: I'm not sure what is breaking about this exactly? How would the parameter object be initialized differently? How would change_params() behavior be altered (I see these as the two main things to worry about in terms of breaking changes). (I know you said the latter no longer returns self, but as you said that is not typically used).

pgunn commented 12 months ago

Right now (pre-this-patch) there are three inputs into CNMFParams: A) You can initialise it with certain named parameters as keyword arguments (this is horrible design and maybe should be deprecated). B) You can initialise it with a dictionary but in a weird format where the general structure of the object is ignored and you provide only the end keywords you want populated (including multiple places, wherever found) in the params. Anything not in the dictionary gets default values. C) Post-init you can call set or change_params() on the object, and this follows that path-searching-replace-them-all behaviour

Example of B:

    params_dict = {'max_shifts': (4, 4),   # maximum allowed rigid shifts (in pixels)
                   'pw_rigid': False,      # flag for performing non-rigid motion correction
                   'border_nan': True,
                   'is3D': D == 3}
    opts = cm.source_extraction.cnmf.params.CNMFParams(params_dict=params_dict)

This diff, relating to those three inputs: A) Does currently nothing at all to this path, although I am considering adding a deprecation warning B) Requires the dictionary initialisation to mirror the structure of the object, allowing fields in different subobjects that accidentally had the same name to have different values (and fixing your bug #1153), preserving the use of defaults for things not in that dict C) Does the same thing with change_params()

Example of B, post-patch:

    params_dict = {
                   'motion': {
                             'border_nan': True,
                             'is3D': D == 3,
                             'max_shifts': (4, 4), # maximum allowed rigid shifts (in pixels)
                             'pw_rigid': False,    # flag for performing non-rigid motion correction
                             } 
                  }
    opts = cm.source_extraction.cnmf.params.CNMFParams(params_dict=params_dict)

It's a breaking change because existing code that doesn't go purely through route A will need to be updated to build/update a params object differently. Post-patch, the pre-patch way of doing B will not work anymore.

j-friedrich commented 12 months ago

Nice work Pat! Am all for for deprecating the initialization using named parameters, same holds when initializing the CNMF object #1085. I think preprocessing.p is never used in practice (if it was it should indeed be the same as temporal.p), the other 4 'duplicated' parameters already differ by name (nb_patch, p_patch, p_ssub, p_tsub) albeit with inconsistent naming convention (*_patch vs p_*). Hope I didn't forget one. I like the organized hierarchical structure of the params object, but am not sure whether it is worth a breaking change. Well, could also support both versions of params_dict for a while but update the demos to the new hierarchical one to nudge users towards the new one. Note that nb has to be the same in init.nb temporal.nb and spatial.nb (but can differ in patch.nb_patch), i.e. change_params should change every subkey. Thus if you go with the hierarchical params_dict there would need to be a consistency check, there is even a check_consistency method for that already :).

pgunn commented 12 months ago

@j-friedrich If it needs to be consistent between the three, we should just have it be a single parameter somewhere? No sense even having 3 separate parameters if there's the possibility of them getting out of sync

EricThomson commented 12 months ago

I like the organized hierarchical structure of the params object, but am not sure whether it is worth a breaking change. Well, could also support both versions of params_dict for a while but update the demos to the new hierarchical one to nudge users towards the new one.

Yes I'd be wary of a breaking change especially initially. Some of these categories are sort of confusing and I like the current method of just sending in a bunch of parameters without having to categorize (or think too much about) why some are in init vs preprocess vs spatial.

But definitely the concern of having the same parameter in different categories should be addressed.

pgunn commented 12 months ago

Based on discussion: A) I'll revise the code to fallback to the old behaviour when unscoped parameters are passed in, with a loud deprecation warning B) Later on we'll eliminate the fallback and make people provide the categories, which will fix the bug that led me down this rabbit hole C) I'll do the conversions of the demos in this diff to the new-style full params D) I'll stick in a loud deprecation warning for the non-dict based parameter setting as well because that mode really ties our hands if we don't get people off of it

We should look into revising the parameters themselves when they don't make sense (particularly the nb parameter), but that'll be separate from this diff

Hoping that sounds reasonable.

EricThomson commented 12 months ago

I'd be wary of even adding warnings etc. Let people use the legacy api for as long as they want as long as it doesn't introduce bugs (and if we get rid of duplicate names etc that should be doable right?).

In the meantime, in our demos and docs etc we'll use the new interface (mention the legacy api in the docs but not the updated demos). I actually sort of like the "flat" interface as it is simple, convenient, and mimics the change_params() api, and like I said before -- some of these categories are not exactly clear-cut it is easier to just throw parameters in all together and not think about what category they belong to.

pgunn commented 12 months ago

We want to deprecate the old way of doing things; the warnings are needed to give people notice so we can eventually have better, simpler code. Leaving old ways of doing things in forever just makes code more bulky and harder to maintain. The legacy api needs to go eventually (both of them, really).

j-friedrich commented 12 months ago

Hm, I'd also be fine with keeping the flat params_dict. What was the bug that led Pat down that rabbit hole in the first place? We have the hierarchical params object mainly to conveniently pass the relevant subset of parameters to update_spatial_components, update_temporal_components, and initialize_components, see cnmf.py. nb is required by all 3 functions. We opted for duplicates instead of introducing, say, a shared group. nb might be the only shared one, otherwise they group rather nicely. An alternative would be to also have the flat structure in the params object and redefine get_group(self, group) to return the subset {key: params[key] for key in self.keys[group]} with self.keys = {'spatial': [...], 'temporal': [...], ... }

Not sure what's best, keep the flat params_dict and the grouped params object, or make them both flat, or both grouped. Agree with not maintaining too many options for too long.

pgunn commented 12 months ago

The thing that led me down this route was that online.ring_CNN gets replaced by the entire self.ring_CNN subobject with the current implementation.

(after that, I spotted the inability to set some parameters separately, the reuse of parameter names, and the general messiness of the setting code)

Having a subset for shared paramters makes sense to me. Going all-in on groups (API and internally) will help us reason about the code, particularly once we get past the transition period; it'll act like variable scopes (flattening things would be the equivalent to making everything global, and is in my view very undesirable)

EricThomson commented 12 months ago

Then I'd just leave it as is then. Flat is better than nested. I don't see a need to make a breaking change like this if the bugs can be ironed out without it. The groupings are convenient for passing things around, but doesn't need to be forced on users. As it is now, we can expose it to them as needed, but when initializing it seems like an unnecessary complexity.

pgunn commented 12 months ago

I think the choice comes down to flattening it internally and externally, or going whole-hog on the structure. The status quo of having it being structured internally but flat externally makes for ugly code and bugs. Right now I'm intending to go with the plan I outlined above unless a good reason turns up to change that.

EricThomson commented 11 months ago

Having marinated with this for a while, it seems the flat structure is a good feature for a few reasons -- primarily zen of Python 'flat is better than nested type reasons' -- flat is easier to read and understand and type. More specifically, I don't like the idea of forcing users to adhere to a structured dictionary of parameters when creating params and change_params() for a few reasons:

  1. The current design is user-friendly, simple, easy to understand, and doesn't force users to grapple with a back end that is better left hidden.
  2. Many pipelines rely on this api, and breaking it will be a pain for users: we shouldn't force people to change because we think it is better for "software engineering" reasons.
  3. The categories are not clear-cut in the parameters object, so imposing this often arbitrary structure on the front-end users seems heavy-handed and will confuse users.
  4. The api for change_params() in particular should remain simple and flat because it should stay easy and transparent to change a couple of parameters.

The reasons for wanting the change (e.g., the issue I raised #1153 ) can be solved by simply adhering to the intent of the parameters object, and not having name clashes. This should be easy enough to fix.