[BUG] Ensembling multiple trajectories with PDELibrary and WeakPDELibrary

znicolaou commented 2 years ago

The PDELibrary and WeakPDELibrary libraries need to reshape the data appropriately to take derivatives and integrals along specific axes when calculating the library matrix. When using multiple_trajectories=True in the SINDy.fit function, we need to inform the library that multiple trajectories are present so the reshaping can be done correctly. I implemented a fix by adding a self.num_trajectories variable to the libraries, which is set in the SINDy.fit function when multiple_trajectories=True. This works when ensemble=False in the SINDy.fit function, but I have not yet implemented the ensembling case. The subset selection should probably be done independently on each trajectory in the multiple_trajectories=True case.

akaptano commented 2 years ago

Dang, I thought I had covered this. Could you post an example that produces an error? Thanks for looking into this!

znicolaou commented 2 years ago

No problem! I was playing with some experimental data, but I'll put together a simpler example that we can work off of and post it here in a bit. I think it won't take much to get the ensembling working--there just needs to be some additional things done in the case of multiple_trajectories in there somewhere.

znicolaou commented 2 years ago

Here's a short example for testing.

integrator_keywords = {}
integrator_keywords['rtol'] = 1e-12
integrator_keywords['method'] = 'LSODA'
integrator_keywords['atol'] = 1e-12

def diffuse (t, u, dx, nx):
    u = np.reshape(u, nx)
    du = ps.differentiation.SpectralDerivative(d=2, axis=0)._differentiate(u, dx)
    return np.reshape(u * du, nx)

N = 200
h0 = 1.0
L = 5
T = 1

t = np.linspace(0, T, N)
x = np.arange(0, N) * L / N

ep = 0.5 * h0
y0 = np.reshape(h0 + ep * np.cos(4 * np.pi / L * x) 
                + ep * np.cos(2 * np.pi / L * x), N)
y1 = np.reshape(h0 + ep * np.cos(4 * np.pi / L * x) 
                - ep * np.cos(2 * np.pi / L * x), N)
dx = x[1] - x[0]

sol1 = solve_ivp(diffuse, (t[0], t[-1]), 
                y0=y0, t_eval=t, args=(dx, N),
                **integrator_keywords)
sol2 = solve_ivp(diffuse, (t[0], t[-1]), 
                y0=y1, t_eval=t, args=(dx, N),
                **integrator_keywords)

u = [np.reshape(sol1.y, (N, N, 1)),np.reshape(sol2.y, (N, N, 1))]

library_functions = [lambda x: x]
library_function_names = [lambda x: x]

pde_lib = ps.PDELibrary(
    library_functions=library_functions,
    function_names=library_function_names,
    derivative_order=2,
    spatial_grid=x,
    is_uniform=True,
)

X, T = np.meshgrid(x, t, indexing='ij')
XT = np.transpose([X, T], [1, 2, 0])

weak_lib = ps.WeakPDELibrary(
    library_functions=library_functions,
    function_names=library_function_names,
    derivative_order=2,
    spatiotemporal_grid=XT,
    K=100,
    is_uniform=False,
    num_pts_per_domain=30,
)

np.random.seed(100)

#works fine
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=pde_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=False)
model.print()

#works fine
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=weak_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=False)
model.print()

#Runs poorly
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=pde_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=True, n_subset=len(t))
model.print()

#ValueError: cannot reshape array of size 80000 into shape (200,200,1)
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=weak_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=True, n_subset=len(t))
model.print()

#AttributeError: 'list' object has no attribute 'shape'
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=pde_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=True)
model.print()

#AttributeError: 'list' object has no attribute 'shape'
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=weak_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=True)
model.print()

The fit works for multiple_trajectories=True, ensemble=False but not for multiple_trajectories=True, ensemble=True. I think the fix should not be too complicated--around line 440 of pysindy.py, there should be some if multiple_trajectories statements that loop over trajectories and do the ensembling on each trajectory.

akaptano commented 2 years ago

Cool, I'm happy to take a look at this tomorrow!

akaptano commented 2 years ago

Did you already make changes to the code? Even the first model throws an error for me

znicolaou commented 2 years ago

Oh yeah, sorry that was totally unclear. I did push a couple changes to master yesterday to implement the multiple_trajectories=True, ensemble=False cases. The old code throws the errors because it previously didn't distinguish the spatiotemporal data for multiple trajectories and ran into reshaping errors when trying to take derivatives/integrals. So I added variables self.num_trajectories to the WeakPDELibrary and PDELibrary classes to store that information when pysindy.fit is invoked, and then looped over each trajectory in the transform functions.

akaptano commented 2 years ago

Got this working... I'll push the changes in a moment but my guess is there will be some weird edge cases left to find at some point. Added your example script to the tests in test_pysindy.py.

akaptano commented 2 years ago

Some notes on the changes:

The weak formulation is really tricky. Basically, in order to get ensembling + multiple trajectories using the weak form, you would need to duplicate the temporal_grid several times and make this the new, longer temporal_grid. This doesn't work because the grid is required to be strictly ascending for the interpolation (also it makes the full spatiotemporal grid much larger). Instead, I settled on sub-sampling each of the trajectories on the normal temporal_grid, and currently each of the trajectories gets sub-sampled in the same way. In other words, if time indices [1, 10, 3] are chosen, trajectories 1, 2, 3, ... all only use these three indices and are stitched together.
It seems like there is a better way to do the num_trajectories thing in the PDE libraries, but I'm not sure what it is.

znicolaou commented 2 years ago

Nice--thanks for looking so quickly! I agree that we may need to test a bit to discover other potential issues, and there are probably cleaner solutions for both PDELibrary and WeakPDELibrary.

In the PDELibrary case, part of the challenge is that it's possible to deduce num_trajectories*num_time = n_sample // np.product(self.grid_dims), but not num_trajectories and num_time independently as far as I can tell. I'll think about whether there's another option too.

I'll keep testing on this issue as I use the package, so we may as well keep it open for now and post updates as we see them.

akaptano commented 2 years ago

Well it compiled fine on my computer but this is a bizarre (and new?) error while running the linting tests online, although looks like it has to do with some git backend. I'll take a look at this in a bit

akaptano commented 2 years ago

apparently today is the day where GitHub finally updates its security protocols https://github.blog/2021-09-01-improving-git-protocol-security-github/#when-are-these-changes-effective. I'll see if I can fix this for the repo.

znicolaou commented 2 years ago

Ah, confusing. We'll figure it out...

akaptano commented 2 years ago

Okay fixed that and think I fixed this bug so closing this for now. Feel free to open again when we invariably find some more edge cases. Hopefully this is good enough for your coarse grained models for now!

dynamicslab / pysindy

[BUG] Ensembling multiple trajectories with PDELibrary and WeakPDELibrary #148