Open Jacob-Stevens-Haas opened 1 year ago
Totally agree with the general sentiment that the code could use a major reorganization because there are a lot of historical hacks. On the trapping SINDy side of things, it is a weird subset of SINDy because it requires (or at least, doesn't make sense without) a (1) quadratically polynomial library and (2) the trapping SINDy optimizer, since it is optimizing additional nonconvex terms AND (3) a specific set of constraints in the coefficients (although we are removing these constraints in followup work on the trapping_extended branch by @MPeng5). It really is a very specific optimization for fluid and plasma type models. There is also some confusion because it inherits from the ConstrainedSR3 optimizer, so you could (but shouldn't) use it just like ConstrainedSR3 (although I think it may complain about non quadratically polynomial libraries). However, the trapping SINDy stuff I believe is compatible with the weak form & other differentiation methods. So not sure what the best plan is for this.
Couple other things:
Sorry, I meant Parallel - Implicit (SINDy-PI). I only have a basic understanding of SINDy-PI, which is that, (a) differentiation $\dot x$ is added as a variable input to the feature library, and one term at a time is removed from the function library and regressed against the others. This is currently implemented in a subclass of SR3 AND adds code to the PDELibrary
, but that's spooky action at a distance and breaks the single responsibility of the PDELibrary
. If the significance of PI is modifying what gets sent to the feature library and setting up a series of regression problems, each dispatched to an optimizer, it feels like PI might be better as a separate SINDy
subclass (which would then make it compatible with any feature library and optimizer). I'm not sure whether PI is compatible with weak or trapping, which might be another reason to promote them to subclasses of SINDy
if they are all compatible with everything else but not each other.
If the weak problem were promoted to a subclass of SINDy
, it could simply not offer a differentiation_method
argument. It would still be able to take any optimizer. It could also be compatible with every function library, because the essence of it happens after function evaluation:
library_functions[k] = np.tensordot(
self.fullweights0[k],
funcs[np.ix_(*self.inds_k[k])],
axes=(
tuple(np.arange(self.grid_ndim)),
tuple(np.arange(self.grid_ndim)),
),
)
Yeah I'm definitely on board with removing legacy ensembling, I can PR that this week. As for the sklearn, I'm not sure what you mean. That BaseEstimator subclasses need fit()
and transform()
(and have a few restrictions on __init__
) seems like small baggage.
There's two problems I'm trying to get at in this GH issue (1) weak and PI problem setup shedding responsibility for function evaluation and optimizers, and (2) hey, wouldn't it be nice if the ontology of classes/types that we define in pysindy itself prohibited mathematically incoherent solutions. You said:
I think it [
TrappingSR3
]may complain about non quadratically polynomial libraries
How does it know what function libraries were used in another step of the problem? This would work:
class TrappingSINDy(SINDy):
def __init__(self, feature_library: TrappingLibrary, optimizer: TrappingSR3, **kwargs):
super(feature_library=feature_library, optimizer=optimizer, **kwargs)
TrappingLibrary = type("TrappingLibrary", (abc.ABC,), {})
class QuadraticLibrary(PolynomialLibrary, TrappingLibrary):
# def __init__(self, **kwargs):
super().__init__(**kwargs, degree=2)
This way, the IDE's background type checker would detect using the wrong library and flag it as an error before running the code. Obviously if QuadraticLibrary
were the only compatible library, the type hint could be feature_library: QuadraticLibrary
, but the abstract base class exists to enable this behavior in case there are multiple possible compatible libraries.
the trapping SINDy stuff I believe is compatible with the weak form
Then we could add:
class WeakQuadraticLibrary(WeakLibrary, TrappingLibrary):
# implement
There is a conflict though if both trapping and the weak form becomes it's own subclass of SINDy
. I'm not really sure of the solution. However, it feels like every problem type needing to be either a subclass of BaseOptimizer
or BaseFeatureLibrary
is causing us to repeat a lot of code. There's definitely some other abstractions at play here and it feels like it would be easier to identify them than leave their logic spread out or implicit.
Discrete SINDy might also fit better as a subclass of SINDy
. I believe it's incompatible with Trapping and Weak form, if not PI. In the code, most logic is couched within an if discrete: do something differently
. Taking that out would make subclassing a lot easier as well.
Sorry my response here is a little slow! I'll try to put some thought into good structure soon. I broadly agree with @akaptano that a big reorganization would nice if someone wants to take a lead, but its a lot of work and maybe relatively little reward to reap
@znicolaou I'm happy to lead and do this incrementally. It's partially driven by how tests fail in somewhat unpredictable places, partially driven by a goal to simplify the SINDy
class to subclass it in my own research, and partially driven by the growing complexity of ways to do SINDy (e.g. the question about parametrized library)
I'm pretty certain this would lead to simpler, more maintainable code, and so long as your comfortable with me making these kinds of decisions. Currently, I the following is not only workable, but pretty easy and wouldn't restrict compatibility at all:
# Classes that do something unique with the LHS AND RHS of the regression equation,
# and are therefore mutually incompatible:
class SINDy(BaseEstimator):
# same
class WeakSINDy(SINDy):
# IIUC, calculating the weak formulation _after_ function evaluation would allow :
# free simulation of discovered weak dynamics
# removing ParametrizedLibrary, WeakPDELibrary (although most of the latter would go in WeakSINDy)
# removing Guard code around combining weak/nonweak libraries, e.g. has_weak()
# removing the ridiculous multiplicity of tests for weak forms
# removing The `temporal_grid` and `spatial_grid` in non-PDE libraies.
class DiscreteSINDy(SINDy):
# Remove all the guard code around if:discrete
class SingleStep(SINDy):
# My research, estimating derivatives and xi at same time
SINDy
, and can you sign off on me starting that work?SINDy
?
PDELibrary
. However, if it's compatible with Weak or discrete, I wouldn't want to promote it and lose that capability. Is PI compatible with weak/discrete SINDy? Regardless, I'll start with the above classes (weak, Single-step, and discrete)TrappingOptimizer
also requires some spooky action, where it's only compabible with 2nd order polynomial features. It currently somehow checks this during optimization, which feels weird. But Alan says it's compatible with WeakSINDy, and I don't have a good idea of how to remove the spooky action without limiting compatibility, so I'll start with the above classes and return to this afterwards.A reminder to make these changes on a separate branch and then to PR it. Couple other thoughts:
@Jacob-Stevens-Haas, yes, I'm on board with promoting weak form to the SINDy class level! And happy to provide some feedback, but I may be a little slow sometimes. I'm not totally clear what the plan would be--how would we control the features used in the WeakSINDy
class? Would we have to implement weak feature libraries that can only be used by the WeakSINDy
? Also, I'm not sure how the ParameterizedLibrary
fits into the discussion - it should function for either weak or nonweak libraries. Anyway, big point is that a lot of people want to use the weak form for both PDEs and ODEs, so we should maintain its accessibility.
One point worth noting, in the current WeakPDELibrary
, some derivatives are still computed for the "mixed" interaction terms where all the derivatives can't be moved onto the the test functions, so a differentiation_method
probably should still be provided there for those cases. I think it currently uses the method provided in kwargs
option or rather than ignoring it.
@akaptano
A reminder to make these changes on a separate branch and then to PR it.
:100:
By discrete SINDy, you mean fitting equations looking like e.g. x_{k+1} = Ax_k?... By making it a subclass, does it lose some interoperability?
Yep, that's what I mean. It would only lose compatibility with anything else that is promoted to a subclass of SINDy, which in the basic set of changes I proposed, would only be WeakSINDy
SINDy-PI is also compatible with the weak form and discrete form in principle, not sure we actually wrote tests for this though.
That's fine, maybe it's best to leave it as-is for now. It could also be added as a step between the library and optimizer, maybe a decorator like we did with EnsembleOptimizer
, or maybe a new kind of abstraction. I'll return to this as a lower priority to try to decouple the implicit terms from the PDELibrary
.
The TrappingOptimizer merely checks that the size of the library is the correct size for a quadratic library from a N-dimensional state space... I think it is compatible with the weak form, but I need to think about this more.
That's fine, none of the other changes depend upon trapping (the changes wouldn't reduce interoperability) there's a lot of things to do before then, and I think trapping would be the hardest to decouple. A few other (not necessarily good) options to consider: TrappingOptimizer
-> _TrappingOptimizer
, and create a factory function for trapping problems that verifies the library is compatible. Or a global "check_compatibility()function added to
SINDy.init()` to assess compatibility of differentiation, feature library, and optimizer.
@znicolaou, you bring up some really good points, and its forcing me to get into the code and understand the weak form better. I suppose there's more value to be gained from refactoring within the WeakPDELibrary
than promoting to subclass of SINDy
, though it may belong there eventually.
The main benefit that would accrue to promoting the weak problem to a subclass of SINDy
is separation of concerns. (a) calc_trajectory
is weird in that SINDy
asks the feature library "should I calculate this derivative" and the feature library says yes/no, but that's really not the feature library's business. (b) Likewise, GeneralizedLibrary[WeakPDELibrary]
vs GeneralizedLibrary[BaseFeatureLibrary]
is an odd duck in how it adapts to having weak libraries. Similar to the python idea that list[int]
is not a subtype of list[float]
.
How would we control the features used in the
WeakSINDy
class? Would we have to implement weak feature libraries that can only be used by theWeakSINDy
?
I'm imagining a weak feature library that decorates another function library, similar to how EnsembleOptimizer
currently functions. As someone only casually reading the weak paper, I am always surprised that WeakPDELibrary
allows its own custom, non-derivative functions instead of accepting other function libraries as inputs, and then puts a lot of work into keeping track of their arguments and names. Especially because the basic function libraries do that gratis.
The other thing that bugs me is that we have expensive, unspecific, and probably unnecessary integration tests, but WeakPDELibrary is not currently factored to allow a unit test of "Does the integral of a simple function result in a known value (e.g. $\int_0^1 x dx$=.5)?" Which means it's tricky to refactor. It would be great if we could refactor it to separate bookkeeping from "evaluate the integral of a simple/mixed/non-derivative term".
One point worth noting, in the current
WeakPDELibrary
, some derivatives are still computed for the "mixed" interaction terms where all the derivatives can't be moved onto the the test functions, so adifferentiation_method
probably should still be provided there for those cases. I think it currently uses the method provided inkwargs
option or rather than ignoring it.
It uses the differentiation_method
arg passed to the library. A differentiation_method
arg passed to the SINDy
object is a no-op. Promoting the weak form would remove the no-op arg and leave the useful arg.
Also, I'm not sure how the
ParameterizedLibrary
fits into the discussion - it should function for either weak or nonweak libraries.
If the weak form was a SINDy
subclass that consumed a feature library and could create weak terms from any feature, ParameterizedLibrary
would no longer needed to override calc_trajectories()
with the case if hasattr(self.libraries_[0], "K")
. It could still live on as a smaller convenience wrapper for the case of tensoring two different libraries. (also noticed it does something with GeneralizedLibrary(..., exclude_libraries,...)
but there isn't a docstring for that argument)
Anyway, big point is that a lot of people want to use the weak form for both PDEs and ODEs, so we should maintain its accessibility.
Agree :100:
@akaptano , @znicolaou
When I proposed to combine the derivative and sparse regression step into a single optimization problem, it was clear that I couldn't use the existing SINDy class without some weird hacks (i.e. creating a
SingleStepOptimizer
that omitted $\dot x$ from the differentiation, expanded the optimization variable to $x$ and $\dot x$, modified the feature library to create identity features for all $x$ and $\dot x$ fields, etc). This reminded me of a lot of what appear, to me, to be idiosyncracies and hacks in the SINDy code:SINDyPIOptimizer
andSINDyPILibrary
FeatureLibrary.calc_trajectory
TensoredLibrary
orGeneralizedLibrary
with aWeakLibrary
and theisinstance()
hacks to detect this.It became clear to me that if I made
SingleStepSINDy
a subclass ofSINDy
, I could do things much more clearly. By not taking a differentiation method, I would prevent users from thinking it made a difference. I could accept any existing Feature Library (other than weak features). And I could add a virtual base class for the optimizers that were compatible with my method (that only applied sparsity to a subset of the optimization variable, when I build this).And so I ask - are weak, parallel, and trapping problems truly a question of feature library/optimizer (and therefore compatible with other choices of differentiation method/feature library/optimizer) or are they mutually incompatible problem setups? If the latter, I think it makes a lot more sense to promote them to sublcasses of
SINDy
. This would allow them to keep their own logic, but probably reduce the SLOCs that you exclusively manage and perhaps the multiplicity of tests.I'm not as familiar with these methods, and knowing how to PR these would involve a more deep reading and understanding the papers and the code (except for PI, which is pretty straightforward). But removing ensembling from the SINDy class (among other things) was the first step, and #319 and #338 would also slim down the
SINDy
class substantially, making subclassing easier.Thoughts?