dynamicslab / pysindy

A package for the sparse identification of nonlinear dynamical systems from data
https://pysindy.readthedocs.io/en/latest/
Other
1.36k stars 304 forks source link

Bayesian SINDy #440

Closed mikkelbue closed 4 months ago

mikkelbue commented 6 months ago

Hi all!

This is a (possible) implementation of Bayesian SINDy. It uses numpyro under the hood to drive NUTS sampling.

This PR is a work in progress in that test and docs are missing. But I thought it would be better to put it in early in the process, in case anyone has questions, ideas or comments with regards to the implementation. Please let me know if this is a bad idea.

Thanks for considering this PR.

TODO:

Jacob-Stevens-Haas commented 6 months ago

Thank you so much for this - I've been interested in getting Hirsch et al's spike slab priors into pysindy for some time! And selfishly, I've wanted a reason to start interacting with jax. We implemented a bit of kernel smoothing from a paper of @jfelipeosorio in derivative, but we shied away from anything that required jax.

Giving this a quick look now (but not yet a full review), here's some more engineering, less mathematical considerations: Import

Example

Code

Ok I lied, some math

mikkelbue commented 6 months ago

Hi @Jacob-Stevens-Haas

Thanks for your comments and suggestions. I'm happy if this is useful!

I have made a few changes:

I'll have a look at your other points when I have more time.

mikkelbue commented 6 months ago

Also, just as a point of order:

This implementation uses the regularised horseshoe prior, not (yet) the spike and slab prior. The regularised horseshoe is computationally more convenient because it is differentiable, so you can just plug the whole shebang into a NUTS sampler.

The spike and slab prior AFAIK requires a compound step sampler or some other clever way of making categorical moves. It's something I can look into, but I am not currently sure what is the best way to achieve this in numpyro.

Jacob-Stevens-Haas commented 6 months ago

Oh, sorry, yes I remember now. I meant to say horseshoe prior. I didn't mean to point you in a different direction.

mikkelbue commented 6 months ago

@Jacob-Stevens-Haas quick update. 2 new commits:

  1. Optimized the sampling of coefficients by vectorizing (seems obvious now, haha)
  2. Added docstring with type hints.

Still looking to update the notebook with the stuff you requested.

mikkelbue commented 6 months ago

@Jacob-Stevens-Haas Okay, I think this addresses all of your points. Please let me know if anything can be improved.

I'm not sure what to add in terms of tests. The notebook has a testing mode now so that should cover some of it.

Jacob-Stevens-Haas commented 6 months ago

Thank you! I'll go ahead and review it later this week. Before then, I'll figure out how to get it so your CI runs don't need manual approval. Sorry to keep you waiting.

For tests,

  1. look at test_optimizers for tests that are parametrized to all optimizers
  2. Think about which ValueErrors (bad arguments) you want to check for.
  3. Is there a simple very-low-dimensional (like a length 2, 3, or 4 vector) regression where it's easy to see how the math for horseshoe priors should work out? The three tests starting here are good examples.
  4. Are there any helper functions that you use that have a simple test? Here's one where we pulled out an indexing subroutine, and therefore could test it independently.
Jacob-Stevens-Haas commented 5 months ago

Ok I added you as a contributor with "write" permission to the repo. You may need to click an invite link in an email, but that should enable CI to run automatically on your pushes.

mikkelbue commented 5 months ago

@Jacob-Stevens-Haas Thank you!

The tests are failing when building the docs and I can't seem to figure out why. This is the trace:

sphinx.errors.SphinxWarning: autodoc: failed to import module 'sbr' from module 'pysindy.optimizers'; the following exception was raised:
No module named 'jax'

Warning, treated as error:
autodoc: failed to import module 'sbr' from module 'pysindy.optimizers'; the following exception was raised:
No module named 'jax'

I have already included jax as an optional dependency in the numpyro group, so I don't understand how this is different from the way cvxpy is set up for SINDyPI. But I must have missed something.

Jacob-Stevens-Haas commented 5 months ago

The github workflow file creates a new environment and installs the current package, including all specified optional dependencies. You'd need to change this line to add your optional dependencies to the list.

codecov[bot] commented 5 months ago

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (9c73768) 94.40% compared to head (2060875) 94.40%. Report is 1 commits behind head on master.

Files Patch % Lines
pysindy/__init__.py 50.00% 2 Missing :warning:
pysindy/optimizers/__init__.py 50.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #440 +/- ## ========================================== - Coverage 94.40% 94.40% -0.01% ========================================== Files 37 38 +1 Lines 3989 4060 +71 ========================================== + Hits 3766 3833 +67 - Misses 223 227 +4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mikkelbue commented 5 months ago

Ok I added you as a contributor with "write" permission to the repo. You may need to click an invite link in an email, but that should enable CI to run automatically on your pushes.

Ah, I just found the invitation. It was on the front page of this repo :thinking: .. thank you!

Jacob-Stevens-Haas commented 5 months ago

Hey sorry for slow responses on my end - I'm responsible for #451 and #459, which are both pretty complex. I'll finish #451 in a few days and then I'll give this a thorough review.

mikkelbue commented 5 months ago

Thanks for all your comments. I responded to one just now, while I had the thought. I'll address the remaining sometime next week.

mikkelbue commented 5 months ago

Hi @Jacob-Stevens-Haas

I'm still working on this, just a little stuck on the next step. I'd like to include an example where the algorithm is used with discrete_time, as this is the case where the Bayesian error model relates directly to the states, as discussed above.

But I am struggling to find a good example. I haven't worked much with discrete time equations before. It's easy enough to take a differential equation and put it into difference equation form. But then you are essentially just doing some Euler, and the examples I have tried so far tend to diverge very fast, unless I use very short timesteps. But that is maybe fine?

Please have a look at this example and let me know it makes sense. It's just a harmonic oscillator but rearranged as a difference equation with first order Euler integration.

Thanks!

Jacob-Stevens-Haas commented 5 months ago

Sorry I didn't respond - I think it's a good example. The only other common discrete system I know of is the logistic map from example 1, but either is fine. That said, I just realized that technically discrete_time=True might have the same issue as continuous; wouldn't the integral term for $\hat x_j$ in eq 3.3 be replaced with a sum term across timepoints until $t_i$?

Again though, I still think it's valuable in both cases, it just isn't the "full-Bayesian" form in the paper. The difference between the PR and the random variable in eq 3.3 is that $X$ is now also conditioned on the calculated derivatives from the first step of SINDy. While in reality, they're those derivatives depend of $X$, we just treat them as independent measurements in the second step.

If that makes sense and checks out, I'm not sure we need an example specifically for discrete.

I'll take a look at the tests tomorrow and try to write some test for SBR/see which parametrized tests it should be added to.

mikkelbue commented 5 months ago

Hi @Jacob-Stevens-Haas

No worries at all. I am aware that I have gone ahead and created extra work for you - without asking first :laughing:

That said, I just realized that technically discrete_time=True might have the same issue as continuous; wouldn't the integral term for in eq 3.3 be replaced with a sum term across timepoints until?

If the system is observed at all timesteps, then you shouldn't have to worry about that summation. The terms of the sum are baked into the right-hand side of the difference equation discrete_time formulation. So you march the system forward with $x_{t+1} = f(x_t, t)$, with whichever terms you are adding together already part of $f(x_t, t)$. As far as I understand, anyway.

Apparently, I am a complete masochist, so I have gone ahead and made the Hirch formulation work with diffrax as an integrator. This is on a separate branch if you want to have a look. It's still pretty sketchy but it does work. Some challenges:

  1. As discussed, the feature_library classes do not handle jax.numpy.arrays, so I have implemented an unsafe version of the PolynomialFeatures for use with the integrator.
  2. It only handles a single trajectory for the time being. It shouldn't be difficult to generalise to multiple trajectories though.
  3. I don't think it handles control variables yet, but I haven't tested it.
  4. It is far more unstable than the other implementation and does not converge unless given sensible starting guess. Which is not that surprising.
  5. It is also a lot slower, since it needs to do that integration for each MCMC sample.

This example now uses that formulation. If I can get this new branch polished up a bit, I would consider splitting the optimizer into two: SBR and LiteSBR to allow using both, since they both have are pros and cons.

What are your thoughts? This would take a little more work to get over the finish line, and the solution is a little less clean than the first one, but it might be useful despite its limitations.

mikkelbue commented 5 months ago

Quick note: I have opted to make the existing "approximate" method available through a flag exact=False in the __init__ method of SBR, rather than a separate LiteSBR optimizer. All this is still on a separate branch.

I think this is a pretty robust implementation at this point. The only glaring issues are

  1. The adaptive step size controller PIDController from diffrax does not seem to play well with numpyro.infer.MCMC, and a XlaRuntimeError is raised every time I attempt to run use it. I can't figure out the reason for this issue.
  2. This implementation is so far only compatible with PolynomialLibrary and whichever feature library we choose to "port" to the ProxyFeatureLibrary.
Jacob-Stevens-Haas commented 5 months ago

:sweat_smile: Ok this one's a thinker. Long story short, let's focus on this PR first. Long story still too short but a bit longer:

One of the thorniest issues I've dealt with in pysindy is the multiplicity of classes that involve repeating code to make a particular use case or method work, often signaled by if isinstance(... calls. It makes it really hard to know how to combine methods in code or which methods are mathematically compatible, and subtle changes to the internal API cause breakages in unexpected locations. A better approach is to create the type system that works and enforces mathematical sensibility. It's better to start with discussing what about the feature library and optimizer API is missing to enable this method as an Issue (germane: #407, #351)

E.g. Are "feature library" and "optimizer" even the right abstractions to use, or is another layer implicit? Off the top of my head, this seems like it could be solved with a mixin class for all feature libraries (except maybe WeakPDELibrary, another example of a class not fitting into the existing abstractions but being forced to). But a mixin may be the wrong solution. Maybe it also requires cooperative multiple inheritance for both optimizers and feature libraries, and the SINDy class determines which direction feature libraries to inherit (default or diffrax) at runtime?


Edit: I realize I effed up on the last equation, this has a mistake. My main confusion is why we don't still have to compose k timesteps for the kth term in the discrete case. I've gotta come back to this another day.

As for the discrete stuff, I was wrong about the sum - it's composition! It took a while to figure out, and I'm not 100% sure I have it right. But the key for me was figuring out the subtly confusing notation in eq 3.3: Way above it, Hirsh et al. introduce $\mathbf{X = [y_1, ..., yn}]^T$ as a measurement. I missed that and had stupidly assumed that a scalar element of $\mathbf{X}$ was $x{ij}$, but it's actually $y_{ij}$ :sob:

But anyways, to illustrate my point, here's Eq 3.3, the data likelihood (ignoring $j$ and $\phi$ WLOG). I need to repurpose $x$, so I'm going to use $\mathbf{Z}$ and $z_i$ to refer to measurements and $\mathbf{X}$ and $x_i$ to refer to the underlying states. $$p(z_i | \mathbf{\Xi}, x_0) = \frac{1}{...} ||z_i - \hat x_i||^2 = \frac{1}{...}||z_i - x_0 - \int_0^{t_i} \mathbf{\Xi} \Theta(x(t')) dx'||^2$$ $$\prod_i p(y_z | \mathbf{\Xi}, x_0)= p(\mathbf{Z|\Xi}, x_0)$$

For modeling a discrete system, it would be:

$$p(z_i | \mathbf{\Xi}, x_0) = \frac{1}{...}||z_i - \hat x_i||^2 = \frac{1}{...}||z_i - \mathbf{\Xi}\Theta(...\mathbf{\Xi}\Theta(x_0)...)||^2$$

with the same product expression holding. I think we've both seen that the related line in the SBR code, paraphrased here:

numpyro.sample("obs", Normal(jnp.dot(x, beta.T), sigma), obs=y)

instead represents the standard SINDy data likelihood (with beta = $\mathbf{\Xi}$, x= $\Theta(Z)=[..., \Theta(z_i),...]$, z= $\dot Z =[..., \dot z_i,...]$) , $$p (\dot{z_i}|\mathbf{\Xi}, x_i=z_i) = \frac{1}{\dots}||\dot z_i -\mathbf{ \Xi} \Theta(z_i)||^2$$ $$\prod_i p (\dot{z_i}|\mathbf{\Xi}, x_i=z_i)= p(\mathbf{\dot Z|\Xi}, \mathbf{X=Z})$$

where we're treating the $\dot z$ calculated in the differentiation step as measurements (and $z$, smoothed in that same step, as measurements replacing the original measurements). For discrete cases, there's no smoothing/differentiation step, and IDK whether there's a probability expression we can use: $$p (zi|\mathbf{\Xi}, x{i-1} = z_{i-1}) = \frac{1}{\dots}||zi -\mathbf{ \Xi} \Theta(z{i-1})||^2$$ $$\prod_i p (zi|\mathbf{\Xi}, x{i-1} = z_{i-1})= ???$$

So... sorry if you've already addressed this in the other branch, or if I've made a mistake here. But I needed to work through it all to understand it.

Jacob-Stevens-Haas commented 5 months ago

So TL; DR from previous comment is that I don't think any change is needed here on the PR to address discrete vs continuous. Hirsh only deal with continuous, but I don't think the analogous data likelihood for discrete cases is what we use either.

Jacob-Stevens-Haas commented 5 months ago

First testing thoughts:

Add SBR to tests of multiple optimizers

test_supports_multiple_targets
test_fit
test_not_fitted
test_complexity_not_fitted
test_illegal_unbias
test_normalize_columns
test_pickle

SBR-specific

mikkelbue commented 5 months ago

@Jacob-Stevens-Haas thanks for all you comments and thoughts!

One of the thorniest issues I've dealt with in pysindy is the multiplicity of classes that involve repeating code to make a particular use case or method work, often signaled by if isinstance(... calls.

I completely agree that this type of design pattern does not spark joy :smiley:. The branch I have linked to is an attempt to make the integral approach work without changing anything in the core library. If I would do some refactoring, I would probably equip the feature libraries with a minimal _transform method that would implement the core operation of the transform but without any checks, so that _transform could work for both AxesArray and jax.numpy.array, and then call that method directly from the diffrax integrator as well as from the transform method of the feature library. But that requires some careful engineering... I am happy to contribute to the discussion of how pysindy could be refactored to better support e.g. Bayesian SINDy.

But anyways, to illustrate my point, here's Eq 3.3, ... But I needed to work through it all to understand it.

You are absolutely correct! I think I got confused because for the discrete case, since here the left-hand-side $x_i$ are members of the data space $x_i \in \mathcal D$, while that is not true for $\dot x_i$. So the inferred $\sigma$ has the correct unit for the discrete case, even when not composing multiple steps. If the steps are not composed (as in the current implementation) the underlying assumption is that the steps are independent, while when composing (or integrating), the assumption is that each step must be consistent with all the previous steps. This is a stronger constraint, which is also reflected in the more concentrated posteriors in the example that showcases both approaches.

we're treating the $\dot z$ calculated in the differentiation step as measurements (and $z_i$, smoothed in that same step, as measurements replacing the original measurements).

I wasn't aware that the differentiation method can also smooth the states! This is something to consider. I get why that is useful for "frequentist" SINDy, but it would change the character of the observation noise, which would be reflected in a different posterior distribution. It's not done for the default FiniteDifference, is it?

I agree we should go ahead and get this merged. My concern with the non-discrete case (as mentioned before) is that $\dot x_i \notin \mathcal D$, so the inferred $\sigma$ is not clearly interpretable. But I think that is okay as long as it's clearly explained in the docstring. having this wrapped up will also make the next steps clearer. I will have a look at writing the appropriate tests some time this week.

Jacob-Stevens-Haas commented 5 months ago

Ah, that's a good point about $\sigma$ and $\mathcal D$. Just goes to show how deep it's possible to go in a paper.

mikkelbue commented 5 months ago

@Jacob-Stevens-Haas

Starting to work on the tests. The GitHub workflow is giving me a strange error though:

sphinx.errors.SphinxWarning: /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pysindy/optimizers/sbr.py:docstring of pysindy.optimizers.sbr.SBR:7:Unknown target name: "mcmc".

Warning, treated as error:
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pysindy/optimizers/sbr.py:docstring of pysindy.optimizers.sbr.SBR:7:Unknown target name: "mcmc".

I think it's because it's interpreting the mcmc_ as a reference but I don't know what to do about it.

mikkelbue commented 5 months ago

I think it's because it's interpreting the mcmc_ as a reference but I don't know what to do about it.

Never mind, I figured it out.

mikkelbue commented 5 months ago

Test writing progress:

Add SBR to tests of multiple optimizers

SBR-specific

mikkelbue commented 5 months ago

test_pickle

This is now added by refactoring the existing test_pickle using @pytest.mark.parametrize.

Something that directly tests horsehoe prior (_sample_reg_horseshoe) in a very simple case

This is slightly problematic, since if called outside a model context, the numpyro.sample functions would need to take an additional argument key to produce a sample. I can adapt the function to optionally take this, but that feels a little roundabout just to get a unit test set up. What do you think?

Something that tests overall SBR in a very simple case

This is now done as test_sbr_fit. It just asserts whether SBR has an mcmc_ attribute after fitting and whether it contains samples of the correct variables.

Jacob-Stevens-Haas commented 4 months ago

I can adapt the function to optionally take this, but that feels a little roundabout just to get a unit test set up. What do you think?

No, that makes sense. I always wonder about what the right way to test things things is (and thereby, understand), but having read more about reg horseshe, I agree the roundabout work is not necessary here. I read a bit more about the reg-horseshoe, so I'll add some more info in the docstrings back in in a suggested edit.

This is now done as test_sbr_fit. It just asserts whether SBR has an mcmc_ attribute after fitting and whether it contains samples of the correct variables.

I meant even simpler setup, but more rigorous test, a la test_sparse_subset. Try this one below:

def test_sbr_fit():
    x = np.eye(2)
    y = np.array([[1], [1e-4]])
    expected = np.array([[1], [0]])
    result = SBR(num_warmup=10, num_samples=10).fit(x, y).coef_
    np.testing.assert_allclose(result, expected, atol=1e-1)

This is a good example of what it takes to get SBR to succeed in the simplest of cases, and I'm looking into why it's failing.

Jacob-Stevens-Haas commented 4 months ago

~Ok, I've got it but I have to run, will post it a bit later. x and y in my code need more samples - even dropping the noise prior near zero doesn't do it.~ There were two problems with my test, and I had only solved one. The first problem was recovering a reasonable answer to a simple regression problem, and that required more samples. The second problem was trying to tune the parameters to demonstrate some shrinkage. But I can't figure out a way to tune this regression to eliminate a parameter (that's why I spent a lot of time on understanding the init parameters this past week). The best I can get is fitting coefficients with no shrinkage:

test_sbr_accurate()
    # It's really hard to tune SBR to get desired shrinkage
    # This just tests that SBR fits "close" to unregularized regression
    x = np.tile(np.eye(2), 4).reshape((-1, 2))
    y = np.tile([[1], [1e-1]], 4).reshape((-1, 1))
    opt = SBR(num_warmup=50,num_samples=50).fit(x, y)
    result = opt.coef_
    unregularized = np.array([[1, 1e-1]])
    np.testing.assert_allclose(result, unregularized, atol=1e-3)
    assert hasattr(opt, "mcmc_")

That's probably good enough for now. In an ideal world, we could have a test that kind of recreated canonical "horeshoe" in Figure 2 of Handling Sparsity via the Horseshoe or Figure 1 of Sparsity Information Regularization in the Horseshoe and other Shrinkage Priors.

Jacob-Stevens-Haas commented 4 months ago

So that's all for testing. For the notebooks, 1) Combine them (just one notebook file) as example.ipynb 2) Save as a python file 3) cd examples, then python publish_notebook.py 19_bayesian_sindy. 4) Run pytest --durations=10 --external_notebook 19_bayesian_sindy to see if it tests (or you can run pytest --durations=10 -m "notebook" -k test_notebook and see if it gets picked up) 5) set constants that determine how long the test takes with if __name__ == "testing": <short value> else: <full value> in order to speed up the test. num_samples and num_warmup are good candidates, as are anything to control the length of the simulation.

mikkelbue commented 4 months ago

@Jacob-Stevens-Haas Thank you for keeping up with this.

The first problem was recovering a reasonable answer to a simple regression problem, and that required more samples.

This is why I avoided comparing to "theoretical" values for the test I originally wrote. MCMC will always require some samples to get a sensible answer, and I suspect that this test with 50 warmup and 50 samples is not yet a converged MCMC. But if this works, that is great.

But I can't figure out a way to tune this regression to eliminate a parameter (that's why I spent a lot of time on understanding the init parameters this past week).

Shrinkage does work as expected when running more elaborate examples, but parameters will hardly ever be exactly 0, since the ._coef are the means of the MCMC samples. We could set a clipping threshold as in e.g. STLSQ, if we wanted to zap the small coefficient completely.

I will have a look at converting the example to a script.

mikkelbue commented 4 months ago

@Jacob-Stevens-Haas

So that's all for testing. For the notebooks, ...

All done. However pytest --durations=10 --external_notebook 19_bayesian_sindy fails with pytest: error: unrecognized arguments: --external_notebook and I can't find any documentation for that argument.

Jacob-Stevens-Haas commented 4 months ago

LGTM, and thank you for your dedication on this! I'll look into that pytest issue separately - pytest allows you create command line arguments in conftest.py, which is what I did for --external-notebook. It existed before the split to example.py/example.ipynb, so probably not necessary anymore.

Looks like example 19 runs in 8.5 seconds under test which is great :)