SugiharaLab / rEDM

Applications of Empirical Dynamic Modeling from Time Series
Other
114 stars 43 forks source link

Questions about 'Nonlinearity' and the significance of ρ under the largest L. #63

Closed fiftybillion5050 closed 7 months ago

fiftybillion5050 commented 8 months ago

I checked the document of rEDM and raise three question.

First, if the length of the time series is about 20, can I use CCM to test the causality?

Second, I notice that we should Identify 'Nonlinearity' before we use CCM, and there is a parameter θ to check the 'Nonlinearity'. If there was no improvement to forecast skill when θ>0, which means that the best forecast skill was given under θ =0, can I use CCM to test the causality? If can't, what can I do? just give to analyze? I'm not sure my understand is right.

Third, When I test the convergence, I use the method like this: method (source: Chang et al., 2020,Long-term warming destabilizes aquatic ecosystems through weakening biodiversity-mediated causal networks) 'Convergence requires that both Kendall’s τ test and Fisher’s Δρ Z test are significant' And my question is: After testing the convergence, should I test the significance by using the twin-surrogate method or the other method? significance (source: the document of rEDM ) I notice there's some research creates many 'surrogate' time series for X, and reconstructed the surrogate time series using Y. When ρ under the largest L of the original data is greater than that of 95th percentile of the surrogate data, we think that the cross map prediction skill is statistically significant.

SoftwareLiteracy commented 8 months ago

Here are my thoughts, others will have more insight based on their experience:

1) if the length of the time series is about 20, can I use CCM to test the causality?

The presumption one makes is that the data sufficiently represent the underlying dynamics. If the dynamics are captured in 20 points, then hypothetically, yes. However, CCM is predicated on increasingly improved representations of the dynamics in the embedding library as the size of the library increases. With 20 points and and embedding dimension of even 2 (how complex can the dynamics be if 20 points adequately captures them?) the convergence test is likely to be ill-posed.

2) Identify 'Nonlinearity' before we use CCM...

In my mind, this is not a requirement, but information to help the analyst understand the data and pose the right questions. If the data are linear, then autoregressive models might be reasonable. Broadly, this is an interesting question: Should EDM be applied to linear systems? I'm not aware this has been addressed as the focus of EDM is on nonlinear (state-dependent) dynamics. Perhaps someone with deeper experience can clarify this?

3) Convergence requires that both Kendall’s τ test and Fisher’s Δρ Z test are significant (Chang 2020). After testing the convergence, should I test the significance by using the twin-surrogate method or the other method?

The method proposed by Chang is in my understanding, a reasonable heuristic to assess whether the ρ(L) curve has "converged". The test seems to require a "significant" increasing trend in ρ(L). As these are statistical (distribution) tests, one can imagine there could be cases of convergence not well-represented by such a test. For example, I wonder how this test fares if the ρ(L) is of the form 1-exp(-aL) with early saturation, or, with levels of change in ρ(L) that are relatively flat. Further, any such metrics will depend heavily on the range and values of L. I'm not aware of a general recipe. I use the method outlined in the rEDM vignette as applied to seasonality:

The idea here is to generate surrogate time series with the same level of shared seasonality. Cross mapping between the real time series and these surrogates thus generates a null distribution for 𝜌, against which the actual cross map 𝜌 value can be compared.