This is a testbed for system identification and forecasting of dynamical systems using the Hankel Alternative View of Koopman (HAVOK) algorithm and Sparse Identification of Nonlinear Dynamics (SINDy). This code is based on the work by Brunton & Kutz (2022) and Yang et. al. (2022).
The current ensemble methods easily diverge/blow-up when simulated for greater than 1000 time steps as errors accumulate, especially for chaotic systems like the Lorenz system. While good prediction performance is not expected for long-term forecasts, we would like to have the model behave "reasonably" and stay within the expected margins, and not blow up. This would at least allow for ensemble forecasts with more reasonable future error bounds (see Orrel et. al., 2001). Alternatively, we could employ data assimilation techniques or Mean Stochastic Models (MSMs) to bring the model back into reasonable ranges (see Vlachas et. al., 2018). This would be more consistent with the current multi-step prediction framework. The most elegant solution would be to incorporate a energy-preserving condition in the loss function or the architecture of the regressor [e.g., Baddoo et. al., 2023].
Increasing the number of predictors (D=1), the performance does not improve:
The way data is interpolated is very important for the ML method.
Adjusting the parameters of the forcing model, we can obtain better results, however the model still diverges (or lags) quickly:
The current ensemble methods easily diverge/blow-up when simulated for greater than 1000 time steps as errors accumulate, especially for chaotic systems like the Lorenz system. While good prediction performance is not expected for long-term forecasts, we would like to have the model behave "reasonably" and stay within the expected margins, and not blow up. This would at least allow for ensemble forecasts with more reasonable future error bounds (see Orrel et. al., 2001). Alternatively, we could employ data assimilation techniques or Mean Stochastic Models (MSMs) to bring the model back into reasonable ranges (see Vlachas et. al., 2018). This would be more consistent with the current multi-step prediction framework. The most elegant solution would be to incorporate a energy-preserving condition in the loss function or the architecture of the regressor [e.g., Baddoo et. al., 2023].
Increasing the number of predictors (D=1), the performance does not improve:
The way data is interpolated is very important for the ML method.
Adjusting the parameters of the forcing model, we can obtain better results, however the model still diverges (or lags) quickly:
[MinLeafSize=27, maxNumSplits=1e5, NumTrees=20]