CSchoel / nolds

Nonlinear measures for dynamical systems (based on one-dimensional time series)
MIT License
261 stars 57 forks source link

Check if DFA needs to be fixed #17

Closed CSchoel closed 3 months ago

CSchoel commented 4 years ago

Following the discussion in NeuroKit2 (https://github.com/neuropsychology/NeuroKit/issues/206) it seems that there are multiple ways to implement DFA. => Check that we really use the correct implementation according to Peng 1995 and update the reference list accordingly.

LRydin commented 3 years ago

Hey @CSchoel I am also working on NeuroKit2 issue #206 and as far as I can judge from your code there shouldn't be any issues. I have compared it to my MFDFA and algorithm-wise, I can't detect any hiccups.

I will use you code and mine to serve as tests for NeuroKit2 fractal_dfa. I hope this also helps you out. I'll report on the tests later.

CSchoel commented 3 years ago

Hey, thank you very much for looking into this! :smile: And also thanks for the hint at Lévy motion in your NeuroKit2 comment. I am always happy to have data sets/processes with prescribed theoretical results against which I can test the algorithms in nolds.

LRydin commented 3 years ago

:) Cheers! Happy to help, I'll keep you in the loop with the work with NeuroKit2. I'll add some tests based on your nolds and my MFDFA, to be sure things we have to validating codes for NK2.

CSchoel commented 10 months ago

I finally came back to this: Retracing my steps from https://github.com/neuropsychology/NeuroKit/issues/206, there is indeed a change required:

We need to apply the square root only after we summed all of the individual squared differences and took the mean. This is consistent with the PhysioNet implementation and the (now discontinued) R-package fractal.

This does not change the unit tests for fractal brownian motion, random noise, and random walks. But it does change the output for the Lorenz system quite a bit. This makes me a bit suspicious about the reference paper for that experiment. I'll try to verify this by using the PhysioNet implementation on the Lorenz data to see if the output is closer to 1 (with the fix) or 1.5 (without the fix, prescribed by González-Salas et al. (2016)).

CSchoel commented 10 months ago

Other To-dos:

CSchoel commented 10 months ago

I looked for another paper reporting the Hurst parameter obtained by DFA from the Lorenz system and found Wallot et al. (2023). The authors actually report values that are closer to 1 than to 1.5.

I trust them more than González-Salas et al. (2016) because Wallot et al. link to the original paper and describe the algorithm very detailed and with the same square-root-last approach as in the fix I now want to apply to nolds. I think it is quite likely that González-Salas et al. just had the same error in their implementation of DFA that I had in nolds.

CSchoel commented 10 months ago

I compared the PhysioNet implementation against nolds. Fortunately, the PhysioNet binary outputs all data points on the log-log-plot, which allows a very fine-grained comparison of the two implementations. After correcting for the fact that nolds uses the natural logarithm and PhysioNet uses base 10, the plots are identical:

physionet_vs_nolds

Looking at the code, I also found this comment:

dfa: Detrended Fluctuation Analysis (translated from C-K Peng's Fortran code) Copyright (C) 2001-2005 Joe Mietus, C-K Peng, and George B. Moody

I think a 100% agreement with a ported version of C-K Peng's original Fortran implementation is as good as it gets for validating that the implementation in nolds is correct.

I think I want to add this experiment as a regression test.

CSchoel commented 9 months ago

I added the regression test.

Open TODOs:

LRydin commented 9 months ago

Hey everyone,

I think there might be some material you can reuse for your tests in the documentation of MFDFA. There you can find some simple code to generate a Lévy motion (see here):

# Imports
# from MFDFA import MFDFA   #not needed, just copied from my docs.
from scipy.stats import levy_stable

# Generate 100000 points
alpha = 1.5
X = levy_stable.rvs(alpha=alpha, beta=0, size=10000)

Since these are purely Markovian processes, you should get a Hurst coefficient of H=0.5 in your DFA implementation :).

Hope it helps!

CSchoel commented 9 months ago

Thanks a lot for the hint. Much appreciated. :heart:

CSchoel commented 9 months ago

Update for myself: Unfortunately, I noticed that the error I had in my formula was due to the same error existing in Hardstone et al. (2012), which I based my entire explanation of DFA in the docstring on. This means I might have to re-write the whole docstring as the explanation involving standard deviations of individual detrended windows doesn't make any sense anymore. :see_no_evil:

I didn't find any other good explanation, so I might have to resort to explaining the how rather than the why. :slightly_frowning_face:

CSchoel commented 8 months ago

Another update for myself: After a bit of searching I, found Kantelhardt et al. (2001) which seems to include a decent explanation for at least some parts of the algorithm. It is mostly consistent with the original definition by Peng et al. (1995), so I think I can use that and maybe combine it with the parts of Hardstone et al. (2012) that don't involve the aforementioned error. Let's hope I can get a somewhat understandable introduction to DFA out of this.

CSchoel commented 4 months ago

I revised the explanation. It needs another proofread, and I still need to find a good motivation why we take the integral / random walk as the first step.

CSchoel commented 3 months ago

Explanation is finished, and Lévy motion is implemented. We're done here! :tada: