Closed znicolaou closed 2 years ago
Dang, I thought I had covered this. Could you post an example that produces an error? Thanks for looking into this!
No problem! I was playing with some experimental data, but I'll put together a simpler example that we can work off of and post it here in a bit. I think it won't take much to get the ensembling working--there just needs to be some additional things done in the case of multiple_trajectories in there somewhere.
Here's a short example for testing.
integrator_keywords = {}
integrator_keywords['rtol'] = 1e-12
integrator_keywords['method'] = 'LSODA'
integrator_keywords['atol'] = 1e-12
def diffuse (t, u, dx, nx):
u = np.reshape(u, nx)
du = ps.differentiation.SpectralDerivative(d=2, axis=0)._differentiate(u, dx)
return np.reshape(u * du, nx)
N = 200
h0 = 1.0
L = 5
T = 1
t = np.linspace(0, T, N)
x = np.arange(0, N) * L / N
ep = 0.5 * h0
y0 = np.reshape(h0 + ep * np.cos(4 * np.pi / L * x)
+ ep * np.cos(2 * np.pi / L * x), N)
y1 = np.reshape(h0 + ep * np.cos(4 * np.pi / L * x)
- ep * np.cos(2 * np.pi / L * x), N)
dx = x[1] - x[0]
sol1 = solve_ivp(diffuse, (t[0], t[-1]),
y0=y0, t_eval=t, args=(dx, N),
**integrator_keywords)
sol2 = solve_ivp(diffuse, (t[0], t[-1]),
y0=y1, t_eval=t, args=(dx, N),
**integrator_keywords)
u = [np.reshape(sol1.y, (N, N, 1)),np.reshape(sol2.y, (N, N, 1))]
library_functions = [lambda x: x]
library_function_names = [lambda x: x]
pde_lib = ps.PDELibrary(
library_functions=library_functions,
function_names=library_function_names,
derivative_order=2,
spatial_grid=x,
is_uniform=True,
)
X, T = np.meshgrid(x, t, indexing='ij')
XT = np.transpose([X, T], [1, 2, 0])
weak_lib = ps.WeakPDELibrary(
library_functions=library_functions,
function_names=library_function_names,
derivative_order=2,
spatiotemporal_grid=XT,
K=100,
is_uniform=False,
num_pts_per_domain=30,
)
np.random.seed(100)
#works fine
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=pde_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=False)
model.print()
#works fine
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=weak_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=False)
model.print()
#Runs poorly
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=pde_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=True, n_subset=len(t))
model.print()
#ValueError: cannot reshape array of size 80000 into shape (200,200,1)
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=weak_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=True, n_subset=len(t))
model.print()
#AttributeError: 'list' object has no attribute 'shape'
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=pde_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=True)
model.print()
#AttributeError: 'list' object has no attribute 'shape'
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=weak_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=True)
model.print()
The fit works for multiple_trajectories=True, ensemble=False
but not for multiple_trajectories=True, ensemble=True
. I think the fix should not be too complicated--around line 440 of pysindy.py, there should be some if multiple_trajectories
statements that loop over trajectories and do the ensembling on each trajectory.
Cool, I'm happy to take a look at this tomorrow!
Did you already make changes to the code? Even the first model throws an error for me
Oh yeah, sorry that was totally unclear. I did push a couple changes to master yesterday to implement the multiple_trajectories=True, ensemble=False
cases. The old code throws the errors because it previously didn't distinguish the spatiotemporal data for multiple trajectories and ran into reshaping errors when trying to take derivatives/integrals. So I added variables self.num_trajectories
to the WeakPDELibrary and PDELibrary classes to store that information when pysindy.fit is invoked, and then looped over each trajectory in the transform functions.
Got this working... I'll push the changes in a moment but my guess is there will be some weird edge cases left to find at some point. Added your example script to the tests in test_pysindy.py.
Some notes on the changes:
Nice--thanks for looking so quickly! I agree that we may need to test a bit to discover other potential issues, and there are probably cleaner solutions for both PDELibrary and WeakPDELibrary.
In the PDELibrary case, part of the challenge is that it's possible to deduce num_trajectories*num_time = n_sample // np.product(self.grid_dims)
, but not num_trajectories
and num_time
independently as far as I can tell. I'll think about whether there's another option too.
I'll keep testing on this issue as I use the package, so we may as well keep it open for now and post updates as we see them.
Well it compiled fine on my computer but this is a bizarre (and new?) error while running the linting tests online, although looks like it has to do with some git backend. I'll take a look at this in a bit
apparently today is the day where GitHub finally updates its security protocols https://github.blog/2021-09-01-improving-git-protocol-security-github/#when-are-these-changes-effective. I'll see if I can fix this for the repo.
Ah, confusing. We'll figure it out...
Okay fixed that and think I fixed this bug so closing this for now. Feel free to open again when we invariably find some more edge cases. Hopefully this is good enough for your coarse grained models for now!
The PDELibrary and WeakPDELibrary libraries need to reshape the data appropriately to take derivatives and integrals along specific axes when calculating the library matrix. When using multiple_trajectories=True in the SINDy.fit function, we need to inform the library that multiple trajectories are present so the reshaping can be done correctly. I implemented a fix by adding a self.num_trajectories variable to the libraries, which is set in the SINDy.fit function when multiple_trajectories=True. This works when ensemble=False in the SINDy.fit function, but I have not yet implemented the ensembling case. The subset selection should probably be done independently on each trajectory in the multiple_trajectories=True case.