embodied-computation-group / systole

Systole: A python package for cardiac signal synchrony and analysis
https://embodied-computation-group.github.io/systole/#
GNU General Public License v3.0
80 stars 31 forks source link

Recurrence matrix: faster implementation #33

Closed DominiqueMakowski closed 2 years ago

DominiqueMakowski commented 2 years ago

Hey @LegrandNico hope you're doing great,

just saw that you were in the process of implementing recurrence matrix stuff for HRV, and thought you might be interested in this (much) faster implementation below:

https://github.com/neuropsychology/NeuroKit/blob/dev/neurokit2/complexity/complexity_recurrence.py

The neurokit one is tested against PyRQA and gives the same results, which are different from your current implementation. In order for it to match it, the np.flip() must be removed, and the subsamples must be cut short by one (I am not 100% certain that this is incorrect though, I would tend to trust pyRQA but 🤷).

Anyway, feel free to steal that code :)

Here are some benchmarks (not taking into account numba gains):

import neurokit2 as nk
import numpy as np

def rc_systole(rr: np.ndarray, m: int = 10, tau: int = 1):
    r = np.sqrt(m) * np.std(rr)  # Threshold for the Euclidean distance
    lag = (m - 1) * tau  # Lag
    j = rr.shape[0] - lag  # Dimension of the recurrence matrix

    # Initialize a (j-l) by (j-l) matrix and fill with zeros
    rc = np.zeros((j, j))

    # Iterate over the lower triangle only
    for i in range(j):
        u_i = rr[i : i + lag : tau]  # First sub-sample of RR intervals
        for ii in range(i + 1):
            u_ii = rr[ii : ii + lag : tau]  # Second sub-sample of RR intervals

            # Compare the Euclidean distance to threshold
            if np.sqrt(np.sum(np.square(u_i - u_ii))) <= r:
                rc[i, ii] = 1

    rc = rc + rc.T - np.diag(np.diag(rc))  # Make the matrix symmetric

    return rc

signal = nk.signal_simulate(duration=5, sampling_rate=500, frequency=[5, 6], noise=0.5)

# Systole (pure python)
%timeit rc_systole(signal)
> 14.3 s ± 1.57 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

# NeuroKit
%timeit nk.complexity_recurrence(signal, dimension=10, tolerance = np.sqrt(10) * np.std(signal))
> 81.4 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 10 loops each

# PyRQA (but also incldues the computation of the indices so not really the best comparison)
%timeit nk.complexity_rqa(signal, delay=1, dimension=10, tolerance=np.sqrt(10) * np.std(signal))
> 181 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Compare outputs (NeuroKit vs. PyRQA)
rc2, _ = nk.complexity_recurrence(signal, dimension=10, tolerance = np.sqrt(10) * np.std(signal))
_, rc3 = nk.complexity_rqa(signal, delay=1, dimension=10, tolerance=np.sqrt(10) * np.std(signal))

np.all(rc3['Recurrence_Matrix'] == rc2)
> True

looking forward to chatting with you sometime ☺️

LegrandNico commented 2 years ago

Hi @DominiqueMakowski

wow, you were fast, I didn't know that the dev branch was under regular scrutiny, I should be careful pushing more polished code :sweat_smile:.

Yes, this implementation is definitely helping, and I think also more valid. I had the feeling that some clever vectorization could help with these things, but as I just started looking at more advanced nonlinear technics this week I was just wrapping my head around the math. The iterative approach will definitely be penalized if running in pure Python, the disadvantage was quite mitigated with Numba otherwise, even if still present. I have then implemented this approach that is taking advantage of the Scipy implementation of Euclidean distance.

I still cannot reproduce precisely all the metrics reported by Kubios (recurrence, determinism...), I think that might be related to the way I extract the diagonals. Do you have something already working in NeuroKit?

DominiqueMakowski commented 2 years ago

I didn't know that the dev branch was under regular scrutiny

big brother is watching you 👁️ ^^

Do you have something already working in NeuroKit

Until now we were relying on pyRQA to get them, but it's a bit of a hassle to get it working, so we'd like to have an in-house implementation too

Regarding recurrence rate, we compute it under the hood in the context of the optimization of r:

https://github.com/neuropsychology/NeuroKit/blob/33a05175f42ad8beecf804ec58352560b11f88af/neurokit2/complexity/optim_complexity_tolerance.py#L122-L127

(with the difference that here it computes the RR for a vector of r-values), but if I'm not mistaken this looked a bit different from your current RR last time I checked, so we might want to double check that.

For all the remaining indices, we can also take a look at pyRQA to see how they get these variables

LegrandNico commented 2 years ago

I am closing here as this implementation will be part of the next release but I will come on NeuroKit someday so we can start a systematic comparison between the indices extracted here and other software like Kubios.