dynamicslab / pysindy

A package for the sparse identification of nonlinear dynamical systems from data
https://pysindy.readthedocs.io/en/latest/
Other
1.42k stars 310 forks source link

WeakPDELibrary when using sparse data loops infinitely #327

Closed MA-Kochen closed 1 year ago

MA-Kochen commented 1 year ago

I'm running pysindy in the context of biological networks. Typical experimental data is noisy and sparse. Because of the noise I am using the weak formulation to infer the ODEs. However, the sparsity of the (simulated) data has presented an issue. When the number of data points falls below 12 the ps.WeakPDELibrary method hangs.

I'm using the Lorenz example as a template. The following doesn't terminate or throw an error, it appears to just run indefinitely.

dt = 1 t_train = np.arange(0, 10, dt)

library_functions = [lambda x: x, lambda x: x x, lambda x, y: x y] library_function_names = [lambda x: x, lambda x: x + x, lambda x, y: x + y]

ode_lib = ps.WeakPDELibrary( library_functions=library_functions, function_names=library_function_names, spatiotemporal_grid=t_train, is_uniform=True, K=100, )

znicolaou commented 1 year ago

Hi @mialko,

The WeakPDELibrary initializes by randomly selecting subdomains of size H_xt from the spatiotemporal domain, iterating until K distinct subdomains are found. I suspect the issue is that there are not K=100 unique subdomains (with the default H_xt) that the library can use to define the weak features when there are less than 12 data points. We'll think about whether we should include a maximum number of iterations and issue an error during initialization if it requires too many iterations so that the library doesn't just hang, but for your current problem, you need to either include more data points or include less subdomains.

znicolaou commented 1 year ago

Actually, I take back part of that. The library does not check if the domains are unique currently, but it does require that each selected domain have more than 2 grid points, which is required to be able to calculate the weak features. If there are too few data points, the randomly place domain centers x_0 and intervals [x_0-H_xt/2, x_0+H_xt/2] will not contain more than two data points. You could probably resolve this by increasing H_xt above its default value of one twentieth of the domain size H_xt=L_xt/20, i.e. something like

ode_lib = ps.WeakPDELibrary(
library_functions=library_functions,
function_names=library_function_names,
spatiotemporal_grid=t_train,
is_uniform=True,
K=100,
H_xt=[3]
)

in your case. But note that you can only really generate K=10 or so distinct subdomains, so using K=100 will produce a lot of repeated features. I don't really expect a lot of success with noisy very sparse data, but good luck!

akaptano commented 1 year ago

Just adding my two cents that it is probably worth adding an error message in the code for folks who try to do this, because the weak form is rather mysterious

MA-Kochen commented 1 year ago

Got it, thanks.