Open TomDonoghue opened 1 year ago
This looks really cool/useful! In the last case you gave, could the super class be SpectralModel? Or what is the benefit of using the base class + the type magic when defining a new class/procedure? For example, the bit where type
is being used to register the new base to SpectralModel could be replaced with inheriting SpectralModel directly, I think:
class CustomModelNoRobustA(SpectralModel):
def _robust_ap_fit(self, *args, **kwargs):
return self._simple_ap_fit(*args, **kwargs)
cfm = CustomModelNoRobustAP()
cfm.fit(freqs, powers)
@ryanhammonds - yeh, the more I think about this, the more I think I over-engineered the last part about redefining the base class lol. What you posted works, and I don't think that redefining base classes as I did adds any real benefit for this kind of changes. Overall, I think simpler approaches to define new objects and overload specific methods are possible, even with version 1.0 (in general, I think). I was having fun digging into redefining base classes and using type
- I didn't know either of these things worked before digging in here and got excited.
What I find useful about this change is a) I think it makes the kind of procedure editing we're talking about more obvious in some ways (even if it doesn't necessarily clearly change what is possible), but also I'm thinking about this more prospectively - if we move this direction, SpectralModel
now defines an API, BaseModel
defines a particular algorithm (call it the 'fooof' algo for convenience now), but defining whole new fitting approaches would I think be much easier in this framework (even if still possible otherwise). For example - I can imagine defining a new BaseModel
that fits exponential functions in linear-linear space (to test & compare), whereas that feels like it would be more annoying to try and do by editing the existing model object. There is still some work to be done across these objects to fully support this kind of generalization (passing in new underlying fit-objects), that's part of what I want to further explore along these lines (might work, might not lol).
Cool! I agree with you that this is useful and in the direction of being more flexible. The distinction between the base and model/api is nice. I think it would make sense to standardize or have requirements for new/custom models. Like a new class should implement a .fit
and set an _ap_fit
attribute, etc, to ensure some level of compatibility with plotting/reporting tools.
@ryanhammonds - yeh, fully agreed, on having some kind of standardization / template for new models. I've started to explore in this direction - recent commits look at the idea of a BaseData
class that could standardize data input / management, regardless of the fit procedure, and we could consider a FitTemplate
class that structures the expected / required aspects of a Fit procedure! I'll keep poking around at this and try some things out
@ryanhammonds & @voytek - I'm tagging you back in here, since while the theme of this PR is about the same as before, the details of the changes and implementations have changed a lot as I explore. Main details are in the now heavily edited first comment - let me know of any thoughts about this: I'm looking for general reads on whether we like this direction.
Also: sorry it's so long - I've been somewhat over-writing the PR descriptions to try and explore the ideas (and hopefully the text here can / will be adapted into docs & release notes as we choose what to include).
For the new base classes, it mentions that these classes “includes no information about what model to fit or how to fit it”. But then is called with BaseFit('fixed', 'gaussian')
, so it seems that the BaseFit class is specific to model type / how to fit? I need to look into the code more to see what happens with these arguments. Edit: I think maybe this is just to way to predefine the model/unknown parameters that are to be optimized for?
I was also unsure about the "SpectralModel inherits from (something like (APIObject, SpectralFitAlgorithm, BaseObject)" part, particularly about the specifics of the distinctions between these classes. Are (APIObject, SpectralFitAlgorithm, BaseObject) optional names for the same thing or a multiple inheritance scheme? If the former, I think that makes sense and follows something related to the scikit-learn api.
Maybe the fit method could somehow be wrapped (kinda like a decorator maybe, this could be tricky figuring out the specifics) by the super class to call optional checks to ensure the custom classifier is compatible following the .fit call (e.g. if validate: assert self._ap_fit is not None...
):
Once I review or try implementing custom fitting procedure, I'll get a better feel for things here.
Responses to the specific questions:
For the new base classes, it mentions that these classes “includes no information about what model to fit or how to fit it”. But then is called with BaseFit('fixed', 'gaussian'), so it seems that the BaseFit class is specific to model type / how to fit? I need to look into the code more to see what happens with these arguments. Edit: I think maybe this is just to way to predefine the model/unknown parameters that are to be optimized for?
Yeh, sorry, that example was a bit unclear - the BaseFit
defines the expectation that there will be some value for aperiodic mode and periodic mode. As it stands, passing in 'fixed', 'gaussian' does nothing more than accept these strings as the "modes". Perhaps clearer would be to say this initializes: fb = BaseFit(aperiodic_mode=None, periodic_mode=None)
, showing that these modes are required inputs, but at the level of the BaseFit object itself it doesn't actually do anything with this information. Ultimately, defining this here means any higher-level options can expect these attributes, but what to actually do with them is only defined at a higher level.
I was also unsure about the "SpectralModel inherits from (something like (APIObject, SpectralFitAlgorithm, BaseObject)" part, particularly about the specifics of the distinctions between these classes. Are (APIObject, SpectralFitAlgorithm, BaseObject) optional names for the same thing or a multiple inheritance scheme? If the former, I think that makes sense and follows something related to the scikit-learn api.
Short answer is that this is a multi-inheritance scheme, which each of these objects being distinct, and implementing a different "part" of what used to all be defined in the same object. This strategy does make the implementation a bit more "fractured" in some ways, and requires more inheritance to get the full functionality in the end-product object (that should match the object as defined in the old scheme).
Maybe the fit method could somehow be wrapped (kinda like a decorator maybe, this could be tricky figuring out the specifics) by the super class to call optional checks to ensure the custom classifier is compatible following the .fit call (e.g. if validate: assert self._ap_fit is not None...):
Yeh, good idea / agreed! This PR is by no means finished, and in my mind interacts with other potential updates I'm trying out now, and ultimately the new customizability will require some kind of additional checks that any custom elements are well defined, etc.
This is the tension, right? Giving users more options and not constraining fit options versus simplicity that does constrain fit options within "often good enough" bounds. I can't see a way of doing this without " exploding the number of objects," as long as the exploding is not required. Meaning, can I still run simple specparam as before without having to be this explicit? As long as I can, then I think what you're proposing is as good of a path to start going down as any.
This is the tension, right? Giving users more options and not constraining fit options versus simplicity that does constrain fit options within "often good enough" bounds. I can't see a way of doing this without " exploding the number of objects," as long as the exploding is not required. Meaning, can I still run simple specparam as before without having to be this explicit? As long as I can, then I think what you're proposing is as good of a path to start going down as any.
Direct answers to @voytek - agreed that this is something of a necessary tradeoff, and to make explicit some things here:
Picking up on this!
Overall, I updated the first comment with a now up to date overview / description of the current changes.
Recent updates:
For @ryanhammonds & @voytek - I'm now pretty confident that this is the the overall way to go, as I can see a way (that I'm pretty confident works) to finalize the new organization to support variable fit functions and fit algorithms. I haven't merged this yet, pending some broader user tests that nothing goes weird (given how broad the changes here). In terms of review - let me know anything you notice if you try it out / take a quick look - but since this is going to be further updated (mainly - #298), I think if the overall strategy looks good, and doesn't seem to be breaking, we can move on this, and do an overall deep-dive review for the end results after more things are merged in.
This is an exploratory PR for re-organizing the objects in
specparam
.New Objects
Data
,Results
, andBase
objectsThis PR defines new sets of objects
Data
andResults
, which define the core properties of managing data and managing model results respectively. These establish the object layouts for spectral models (fitting function and algorithm agnostic). This includes no information about what model to fit or how to fit it - but means any model built on top of this base should always have the same basic API for common and data elements. This also allows for a conceptual distinction between 'Data' and 'Results' elements. There are separate objects for 1D / 2D / 3D data & associated results.Here, we also add a set of
Base
objects, which combineData
andResults
objects, as well as adding any elements that require the combination of data & results. At this point, theBase
object defines the core API / interface for the data & results components, without having an algorithm or any fit functions to fit - establishing a common platform for fitting.Note that none of these objects are really expected to be used independently or accessed / used by the user, and even if defining a new algorithm, etc, should only need to be inherited from without really interacting with. Related, the
Data
objects can be used to manage / check data independently, but theResults
andBase
objects don't actually offer the capacity to do anything without having a defined algorithm.Algorithm Objects
With the
Base
objects, we now need to define an algorithm, with the idea being that a separate object defines the algorithm - and only the algorithm. In doing so, anAlgorithm
object should be aware ofBase
(meaning, it should use and expected the methods and attributes thatBase
define (as a combination ofData
andResults
). If so, then by combining aBase
andAlgorithm
object we get a functional spectral parameterization object, with a common core for managing data & results, that operates with a specified algorithm.In this current PR, the
SpectralFitAlgorithm
object (name could definitely be changed) defines the current algorithm. If this object is used as the Algorithm object, everything should end up the same as 1.0. The idea here is if someone wants to define a new fitting algorithm, then they can define a new Algorithm object, without having to implement any of the underlying basics, and getting the same overall API out at the end.Note that in separating out the algorithm, this PR currently exposes the same set of 'public' settings as before, but now also does allow for passing through updates to 'private' settings (see the
SpectralFitAlgorithm
definition).Model Objects
At this point, we have objects that define the basic data, results, and algorithm features - and just need to combine these into a user facing object, that also defines the user facing API (doing things like printing, plotting, getting results). This is what is now in the
SpectralModel
,SpectralGroupModel
, etc objects (now inmodel
andgroup
files), which, from the user perspectives, should act exactly the same as they did prior to this update.Note - as it stands, the
SpectralModel
object inherits directly from the current Algorithm object, and defines the API on top of that. To put in a new fitting algorithm, one has to redefine the model objects, updated to use a new algorithm object (see outline below) - by using the same Base (& additional model methods), the API should end up identical, with a different internal fit procedure. Further updates should be able to add helpers / shortcuts to redefine model objects with new fit functions, without too much work on the user end to reconstitute the objects.Overall Recap
From the user perspective, nothing should change, objects such as
SpectralModel
should work the same.From a developer / internal perspective, we now have the following object organization:
Data
objects: structured object for managing data (with versions for 1D, 2D, 3D, etc)Results
objects: structured object for managing model fit results (with versions for 1D, 2D, 3D, etc)Base
objects: combinations of Data & Results, plus any additional methods that cover both aspectsAlgorithm
objects: defines a fitting procedureModel
objects: combines base & algorithm objs to create a model object (plus some additional user facing methods)Code Examples
Importing New Base Objects
With the caveat that there is no real use case to using the Base objects directly - there objects can be imported and defined:
Algorithm Objects: Opening Up Access to "private" parameters
One of the original goals of trying some of the updates here was to open up explicit access to the "private" settings, for particular use cases that might want to tweak them, without complicating the Model object for standard usage. This works for that - as before, we can pass through "private" settings into the standard model object, using
**model_kwargs
:Where this is going: defining new fit algorithms
A key goal of this update is to support the generalization of fitting, and while I think this will help with supporting new / different fit functions (updates to come), the more obvious support this adds is that it should support adding a fit procedure relatively easily. At least, as compared to previous organization, I do not think it would be very easy / obvious to define a new fit function while maintaining other more general functionality, whereas as now an interested party should be able to define a brand new
Algorithm
object, and plug it in (inherit from Base / put into API object) and it should work. We can add a bit of guidance of what needs to be followed for this to work - but I think it should be quite tractable.As an outline of what it would look like to add a new fit function, see this (functional, but schematic) example:
Additional notes and examples
Below are some additional notes and examples that are somewhat tangential (the code examples are partly from older explorations, and all still work - but they are more like cool / interesting extras than anything that is actually important in relation to this update.
Having a minimal Model object
In the current version, the Base objects can be joined with an Algorithm object to form an operational minimal fit object, without the additional methods that are defined on the full model objects (basically, without having the public facing API). It is now fairly easy to create a functional 'minimal fit object' - having a functionally complete object for fitting the model, with user facing methods on top of that (reports, plots, fancy parameter access, etc) - though, again, this might have no clear use case.
This code shows how to make a minimal fit object:
Note: although more minimal of an object, the Base object does the exact same as the previous object for fitting, so is no faster. We did chat briefly on a "minimum / speedy" version of the object / fit function that reduces the number of checks, etc that are done. I don't think there's much to do on that - we are pretty efficient on that stuff, and our time is spent fitting the actual model.
Editing a fit procedure
More nuanced than redefining a brand new function, is what was included in the previous version about editing fit approaches (including redefining Base objects). I think the current layout suggests slightly different mechanics to do so, while supporting all the same possibilities.
For example, we can customize an existing Algorithm object to modify a fit:
Interesting result - with basic simulated data, it's less catastrophic a change than I anticipated 🤷:
Before, there was an example showing something similar by retrofitting an existing object with a new Base (now Algorithm) object. If we go in the current direction, the recommended procedure for a brand new algorithm would probably be different - but FWIW, this does still work:
Relevant previous text (for posterity): The above object has an overloaded fit function, and so prints out the message ("what if... a new fit procedure model!"), but otherwise has everything defined on the normal
SpectralModel
object! This means that be filling out the newfit
function with a new procedure, one could built out new algorithm approaches, using the toolbox, but without needing to edit inside the module.A benefit of moving in this direction (as well as being a bit more modular) is that the explicit isolation of the actual fitting from all the "management" code opens up the possibility for plugging in different fit functions to the same overlying object to interact with the model.
A heads up though - I'm not sure what is shown (relating to redefining underlying objects) is something we want to lean into exposing / promoting for people to try custom fit approaches - but I find it cool, and also it could be useful as a development feature to explore how different fit procedures work.
Thoughts & Conclusions
This update opens up our set of objects, and re-organizes the 'do it all together' approach, into a set of distinct objects, that can be combined, that should allow for developing different fit algorithms, and more easily using different fit functions. The argument from modularity is that this is a much more organized layout, even if the increased number of entities may seem (on the face of it) more complicated. In so far as
specparam
is a framework for spectral parameterization (more than any specific algorithm or function for doing so), I think something like this is probably what we want - but there's definitely room for tweaks and different ideas. As it stands, I think this current change is useful in terms of doing what we want it to do (opening private setting access), and has potentially useful benefits for going forward (opens up more access to defining custom fit procedures).Reviewing
This is still at a pretty high-level stage of deciding on key strategy - so in terms of reviewing, I'm not really looking for detailed review, but rather strategy discussions - so any conceptual / strategic check-ins are more than welcome!