[ENH] - Define new object organization (Data / Results / Base / Algorithm)

This is an exploratory PR for re-organizing the objects in specparam.

New Objects

`Data`, `Results`, and `Base` objects

This PR defines new sets of objects Data and Results, which define the core properties of managing data and managing model results respectively. These establish the object layouts for spectral models (fitting function and algorithm agnostic). This includes no information about what model to fit or how to fit it - but means any model built on top of this base should always have the same basic API for common and data elements. This also allows for a conceptual distinction between 'Data' and 'Results' elements. There are separate objects for 1D / 2D / 3D data & associated results.

Here, we also add a set of Base objects, which combine Data and Results objects, as well as adding any elements that require the combination of data & results. At this point, the Base object defines the core API / interface for the data & results components, without having an algorithm or any fit functions to fit - establishing a common platform for fitting.

Note that none of these objects are really expected to be used independently or accessed / used by the user, and even if defining a new algorithm, etc, should only need to be inherited from without really interacting with. Related, the Data objects can be used to manage / check data independently, but the Results and Base objects don't actually offer the capacity to do anything without having a defined algorithm.

Algorithm Objects

With the Base objects, we now need to define an algorithm, with the idea being that a separate object defines the algorithm - and only the algorithm. In doing so, an Algorithm object should be aware of Base (meaning, it should use and expected the methods and attributes that Base define (as a combination of Data and Results). If so, then by combining a Base and Algorithm object we get a functional spectral parameterization object, with a common core for managing data & results, that operates with a specified algorithm.

In this current PR, the SpectralFitAlgorithm object (name could definitely be changed) defines the current algorithm. If this object is used as the Algorithm object, everything should end up the same as 1.0. The idea here is if someone wants to define a new fitting algorithm, then they can define a new Algorithm object, without having to implement any of the underlying basics, and getting the same overall API out at the end.

Note that in separating out the algorithm, this PR currently exposes the same set of 'public' settings as before, but now also does allow for passing through updates to 'private' settings (see the SpectralFitAlgorithm definition).

Model Objects

At this point, we have objects that define the basic data, results, and algorithm features - and just need to combine these into a user facing object, that also defines the user facing API (doing things like printing, plotting, getting results). This is what is now in the SpectralModel, SpectralGroupModel, etc objects (now in model and group files), which, from the user perspectives, should act exactly the same as they did prior to this update.

Note - as it stands, the SpectralModel object inherits directly from the current Algorithm object, and defines the API on top of that. To put in a new fitting algorithm, one has to redefine the model objects, updated to use a new algorithm object (see outline below) - by using the same Base (& additional model methods), the API should end up identical, with a different internal fit procedure. Further updates should be able to add helpers / shortcuts to redefine model objects with new fit functions, without too much work on the user end to reconstitute the objects.

Overall Recap

From the user perspective, nothing should change, objects such as SpectralModel should work the same.

From a developer / internal perspective, we now have the following object organization:

Data objects: structured object for managing data (with versions for 1D, 2D, 3D, etc)
Results objects: structured object for managing model fit results (with versions for 1D, 2D, 3D, etc)
Base objects: combinations of Data & Results, plus any additional methods that cover both aspects
Algorithm objects: defines a fitting procedure
Model objects: combines base & algorithm objs to create a model object (plus some additional user facing methods)

Code Examples

Importing New Base Objects

With the caveat that there is no real use case to using the Base objects directly - there objects can be imported and defined:

# Define and add data to a BaseData object
fd = BaseData()
fd.add_data(freqs, powers)

# Define a base results object, and check modes
fr = BaseResults('fixed', 'gaussian')
fr.aperiodic_mode, fr.periodic_mode

# Check can define common base objects
base = CommonBase()
base1d = BaseObject()
base2d = BaseObject2D()

Algorithm Objects: Opening Up Access to "private" parameters

One of the original goals of trying some of the updates here was to open up explicit access to the "private" settings, for particular use cases that might want to tweak them, without complicating the Model object for standard usage. This works for that - as before, we can pass through "private" settings into the standard model object, using **model_kwargs :

fm = SpectralModel(cf_bound=1.)
fm.fit(freqs, powers)

Where this is going: defining new fit algorithms

A key goal of this update is to support the generalization of fitting, and while I think this will help with supporting new / different fit functions (updates to come), the more obvious support this adds is that it should support adding a fit procedure relatively easily. At least, as compared to previous organization, I do not think it would be very easy / obvious to define a new fit function while maintaining other more general functionality, whereas as now an interested party should be able to define a brand new Algorithm object, and plug it in (inherit from Base / put into API object) and it should work. We can add a bit of guidance of what needs to be followed for this to work - but I think it should be quite tractable.

As an outline of what it would look like to add a new fit function, see this (functional, but schematic) example:

# Define a customized algorithm object, that updates the fit procedure
class CustomAlgorithm():
    def _fit(*args, **kwargs):
        print("what if... a new fit procedure model!")

# Create new model object
class CustomSpectralModel():

    def __init__(self, *args, **kwargs):
        """Initialize model object."""

        BaseObject.__init__(self, aperiodic_mode='custom', periodic_mode='custom',
                            debug_mode=model_kwargs.pop('debug_mode', False), verbose=verbose)

        CustomAlgorithm.__init__(self)

# Initialize and run fitting from custom algorithm
fm = CustomModel()
fm.fit(freqs, powers)

Additional notes and examples

Below are some additional notes and examples that are somewhat tangential (the code examples are partly from older explorations, and all still work - but they are more like cool / interesting extras than anything that is actually important in relation to this update.

Having a minimal Model object

In the current version, the Base objects can be joined with an Algorithm object to form an operational minimal fit object, without the additional methods that are defined on the full model objects (basically, without having the public facing API). It is now fairly easy to create a functional 'minimal fit object' - having a functionally complete object for fitting the model, with user facing methods on top of that (reports, plots, fancy parameter access, etc) - though, again, this might have no clear use case.

This code shows how to make a minimal fit object:

# Define a minimal model object, by adding the algorithm to the base
class MinimalSpectralModel(SpectralFitAlgorithm, BaseObject):

    def __init__(self, **model_kwargs):

        BaseObject.__init__(self, aperiodic_mode=model_kwargs.pop('aperiodic_mode', 'fixed'),
                            periodic_mode='gaussian', debug_mode=model_kwargs.pop('debug_mode', False),
                            verbose=model_kwargs.pop('verbose', True))
        SpectralFitAlgorithm.__init__(self, **model_kwargs)

# Define and run model fitting on the minimal model object
fb = MinimalSpectralModel()
fb.fit(freqs, powers)

Note: although more minimal of an object, the Base object does the exact same as the previous object for fitting, so is no faster. We did chat briefly on a "minimum / speedy" version of the object / fit function that reduces the number of checks, etc that are done. I don't think there's much to do on that - we are pretty efficient on that stuff, and our time is spent fitting the actual model.

Editing a fit procedure

More nuanced than redefining a brand new function, is what was included in the previous version about editing fit approaches (including redefining Base objects). I think the current layout suggests slightly different mechanics to do so, while supporting all the same possibilities.

For example, we can customize an existing Algorithm object to modify a fit:

# Define a custom algorithm object that aliases the robust AP fit to re-use the simple fit
class NewAlgoNoRobustAP(SpectralFitAlgorithm, BaseObject):
    def _robust_ap_fit(self, *args, **kwargs):
        return self._simple_ap_fit(*args, **kwargs)
# Use some type magic to update the object
CustomModelNoRobustAP = type('CustomModelNoRobustAP', (NewAlgoNoRobustAP,), dict(SpectralModel.__dict__))

Interesting result - with basic simulated data, it's less catastrophic a change than I anticipated 🤷: Screen Shot 2023-08-07 at 1 59 50 PM

Before, there was an example showing something similar by retrofitting an existing object with a new Base (now Algorithm) object. If we go in the current direction, the recommended procedure for a brand new algorithm would probably be different - but FWIW, this does still work:

# Define a customized algorithm object, that updates the fit procedure
class CustomAlgorithm(SpectralFitAlgorithm, BaseObject):
    def fit(*args, **kwargs):
        print("what if... a new fit procedure model!")

# Register new CustomModel class that redefine the algorithm object for SpectralModel
CustomModel = type('CustomModel', (CustomAlgorithm,), dict(SpectralModel.__dict__)) 

# Initialize and run fitting from custom algorithm
fm = CustomModel()
fm.fit(freqs, powers)

Relevant previous text (for posterity): The above object has an overloaded fit function, and so prints out the message ("what if... a new fit procedure model!"), but otherwise has everything defined on the normal SpectralModel object! This means that be filling out the new fit function with a new procedure, one could built out new algorithm approaches, using the toolbox, but without needing to edit inside the module.

A benefit of moving in this direction (as well as being a bit more modular) is that the explicit isolation of the actual fitting from all the "management" code opens up the possibility for plugging in different fit functions to the same overlying object to interact with the model.

A heads up though - I'm not sure what is shown (relating to redefining underlying objects) is something we want to lean into exposing / promoting for people to try custom fit approaches - but I find it cool, and also it could be useful as a development feature to explore how different fit procedures work.

Thoughts & Conclusions

This update opens up our set of objects, and re-organizes the 'do it all together' approach, into a set of distinct objects, that can be combined, that should allow for developing different fit algorithms, and more easily using different fit functions. The argument from modularity is that this is a much more organized layout, even if the increased number of entities may seem (on the face of it) more complicated. In so far as specparam is a framework for spectral parameterization (more than any specific algorithm or function for doing so), I think something like this is probably what we want - but there's definitely room for tweaks and different ideas. As it stands, I think this current change is useful in terms of doing what we want it to do (opening private setting access), and has potentially useful benefits for going forward (opens up more access to defining custom fit procedures).

Reviewing

This is still at a pretty high-level stage of deciding on key strategy - so in terms of reviewing, I'm not really looking for detailed review, but rather strategy discussions - so any conceptual / strategic check-ins are more than welcome!

This looks really cool/useful! In the last case you gave, could the super class be SpectralModel? Or what is the benefit of using the base class + the type magic when defining a new class/procedure? For example, the bit where type is being used to register the new base to SpectralModel could be replaced with inheriting SpectralModel directly, I think:

class CustomModelNoRobustA(SpectralModel):    
    def _robust_ap_fit(self, *args, **kwargs):
        return self._simple_ap_fit(*args, **kwargs)

cfm = CustomModelNoRobustAP()
cfm.fit(freqs, powers)

@ryanhammonds - yeh, the more I think about this, the more I think I over-engineered the last part about redefining the base class lol. What you posted works, and I don't think that redefining base classes as I did adds any real benefit for this kind of changes. Overall, I think simpler approaches to define new objects and overload specific methods are possible, even with version 1.0 (in general, I think). I was having fun digging into redefining base classes and using type - I didn't know either of these things worked before digging in here and got excited.

What I find useful about this change is a) I think it makes the kind of procedure editing we're talking about more obvious in some ways (even if it doesn't necessarily clearly change what is possible), but also I'm thinking about this more prospectively - if we move this direction, SpectralModel now defines an API, BaseModel defines a particular algorithm (call it the 'fooof' algo for convenience now), but defining whole new fitting approaches would I think be much easier in this framework (even if still possible otherwise). For example - I can imagine defining a new BaseModel that fits exponential functions in linear-linear space (to test & compare), whereas that feels like it would be more annoying to try and do by editing the existing model object. There is still some work to be done across these objects to fully support this kind of generalization (passing in new underlying fit-objects), that's part of what I want to further explore along these lines (might work, might not lol).

Cool! I agree with you that this is useful and in the direction of being more flexible. The distinction between the base and model/api is nice. I think it would make sense to standardize or have requirements for new/custom models. Like a new class should implement a .fit and set an _ap_fit attribute, etc, to ensure some level of compatibility with plotting/reporting tools.

@ryanhammonds - yeh, fully agreed, on having some kind of standardization / template for new models. I've started to explore in this direction - recent commits look at the idea of a BaseData class that could standardize data input / management, regardless of the fit procedure, and we could consider a FitTemplate class that structures the expected / required aspects of a Fit procedure! I'll keep poking around at this and try some things out

@ryanhammonds & @voytek - I'm tagging you back in here, since while the theme of this PR is about the same as before, the details of the changes and implementations have changed a lot as I explore. Main details are in the now heavily edited first comment - let me know of any thoughts about this: I'm looking for general reads on whether we like this direction.

Also: sorry it's so long - I've been somewhat over-writing the PR descriptions to try and explore the ideas (and hopefully the text here can / will be adapted into docs & release notes as we choose what to include).

For the new base classes, it mentions that these classes “includes no information about what model to fit or how to fit it”. But then is called with BaseFit('fixed', 'gaussian'), so it seems that the BaseFit class is specific to model type ~~/ how to fit~~? I need to look into the code more to see what happens with these arguments. Edit: I think maybe this is just to way to predefine the model/unknown parameters that are to be optimized for?

I was also unsure about the "SpectralModel inherits from (something like (APIObject, SpectralFitAlgorithm, BaseObject)" part, particularly about the specifics of the distinctions between these classes. Are (APIObject, SpectralFitAlgorithm, BaseObject) optional names for the same thing or a multiple inheritance scheme? If the former, I think that makes sense and follows something related to the scikit-learn api.

Maybe the fit method could somehow be wrapped (kinda like a decorator maybe, this could be tricky figuring out the specifics) by the super class to call optional checks to ensure the custom classifier is compatible following the .fit call (e.g. if validate: assert self._ap_fit is not None...):

Once I review or try implementing custom fitting procedure, I'll get a better feel for things here.

Responses to the specific questions:

For the new base classes, it mentions that these classes “includes no information about what model to fit or how to fit it”. But then is called with BaseFit('fixed', 'gaussian'), so it seems that the BaseFit class is specific to model type / how to fit? I need to look into the code more to see what happens with these arguments. Edit: I think maybe this is just to way to predefine the model/unknown parameters that are to be optimized for?

Yeh, sorry, that example was a bit unclear - the BaseFit defines the expectation that there will be some value for aperiodic mode and periodic mode. As it stands, passing in 'fixed', 'gaussian' does nothing more than accept these strings as the "modes". Perhaps clearer would be to say this initializes: fb = BaseFit(aperiodic_mode=None, periodic_mode=None), showing that these modes are required inputs, but at the level of the BaseFit object itself it doesn't actually do anything with this information. Ultimately, defining this here means any higher-level options can expect these attributes, but what to actually do with them is only defined at a higher level.

I was also unsure about the "SpectralModel inherits from (something like (APIObject, SpectralFitAlgorithm, BaseObject)" part, particularly about the specifics of the distinctions between these classes. Are (APIObject, SpectralFitAlgorithm, BaseObject) optional names for the same thing or a multiple inheritance scheme? If the former, I think that makes sense and follows something related to the scikit-learn api.

Short answer is that this is a multi-inheritance scheme, which each of these objects being distinct, and implementing a different "part" of what used to all be defined in the same object. This strategy does make the implementation a bit more "fractured" in some ways, and requires more inheritance to get the full functionality in the end-product object (that should match the object as defined in the old scheme).

Maybe the fit method could somehow be wrapped (kinda like a decorator maybe, this could be tricky figuring out the specifics) by the super class to call optional checks to ensure the custom classifier is compatible following the .fit call (e.g. if validate: assert self._ap_fit is not None...):

Yeh, good idea / agreed! This PR is by no means finished, and in my mind interacts with other potential updates I'm trying out now, and ultimately the new customizability will require some kind of additional checks that any custom elements are well defined, etc.

This is the tension, right? Giving users more options and not constraining fit options versus simplicity that does constrain fit options within "often good enough" bounds. I can't see a way of doing this without " exploding the number of objects," as long as the exploding is not required. Meaning, can I still run simple specparam as before without having to be this explicit? As long as I can, then I think what you're proposing is as good of a path to start going down as any.

This is the tension, right? Giving users more options and not constraining fit options versus simplicity that does constrain fit options within "often good enough" bounds. I can't see a way of doing this without " exploding the number of objects," as long as the exploding is not required. Meaning, can I still run simple specparam as before without having to be this explicit? As long as I can, then I think what you're proposing is as good of a path to start going down as any.

Direct answers to @voytek - agreed that this is something of a necessary tradeoff, and to make explicit some things here:

I agree that in so far as we are generalizing, we need to "unpack" away from the "do-it-all" objects we currently have, so while I expect we need to end up with more objects in general, the more specific question is whether there is some reason / pressure to find a more middle ground solution that might work with fewer objects than this current implementation
As far as the user is concerned, if everything works properly (not 100% gauranteed currently - but meaning, if we make sure this does come together properly), the average user can interact with the public objects exactly as before, without even needing to know about any of these "behind-the-scenes" changes
Related to the above points, since the user need not know, the question and tradeoffs mentioned here is really about developers (and not users). On the face of it, I think it might well feel like this layout is harder to interact with (as a potential developer), since it now means looking at at least 3-4 objects over a few files to understand the overall SpectralModel object. I'm not sure if it's true that this is actually harder, since I think we gain something in organization that we lose in conciseness - and (coming back to the first point) there necessarily going to be at least some added complexity to support the broader use cases we want to support.
Overall, I don't really have an alternative to suggest (this is currently my best and sorta only idea about moving towards a framework for generalized spectral fitting), but I still want to get some feedback early here in case any does have alternate strategies and/or worries about continuing down this path

Picking up on this!

Overall, I updated the first comment with a now up to date overview / description of the current changes.

Recent updates:

now that Time / Event models are merged into main, I merged that here, so these objects now use the new organization
cleaned up the object organization (notably, 'Fit' -> 'Results'), such that I think the new organization is cleaner / clearer

For @ryanhammonds & @voytek - I'm now pretty confident that this is the the overall way to go, as I can see a way (that I'm pretty confident works) to finalize the new organization to support variable fit functions and fit algorithms. I haven't merged this yet, pending some broader user tests that nothing goes weird (given how broad the changes here). In terms of review - let me know anything you notice if you try it out / take a quick look - but since this is going to be further updated (mainly - #298), I think if the overall strategy looks good, and doesn't seem to be breaking, we can move on this, and do an overall deep-dive review for the end results after more things are merged in.

fooof-tools / fooof