DynamicsAndNeuralSystems / pyspi

Comparative analysis of pairwise interactions in multivariate time series.
https://time-series-features.gitbook.io/pyspi/
GNU General Public License v3.0
192 stars 25 forks source link

Record compute time for each SPI #7

Open olivercliff opened 2 years ago

olivercliff commented 2 years ago

Will be useful for knowing which methods are fast/slow to compute and allows users to select faster options.

This might be finicky since many of the methods inherit preprocessed information from other methods (e.g., all spectral methods inherit spectral decompositions).

anniegbryant commented 1 year ago

Just wanted to echo this -- presented pyspi at CNS2022 and received questions about approx how long each SPI takes so users can estimate time requirements for a job

benfulcher commented 1 year ago

Could check whether the preprocessed information is ~fast and thus could be neglected for an initial estimate. If so, this could be straightforward to benchmark on a range of simple VAR processes (for # processes and # time points).

mesner commented 5 months ago

For posterity, as I'm sure no one else cares after 18 months. Here's a code snipped I used for the same question. Yes, it's hack, and the total time is about 2x what it takes to calculate them all at once (IIRC). But, it's something.
Note that some spi's fail in my gpu-less linux env.

import numpy as np
import pandas as pd
import random
import time
from pyspi.calculator import Calculator

random.seed(42)

M = 2 # 5 processes
T = 300 # 500 observations

dataset = np.random.randn(M,T)
calc = Calculator(dataset=dataset)
spi_items = calc.spis.copy()
df_rows = []
for (k,v) in spi_items.items():
    calc.spis.clear()
    calc.spis[k] = v
    begTime = time.perf_counter()
    calc.compute()
    calcTime = time.perf_counter() - begTime
    df_rows.append(dict(spi=k, time=calcTime))

pd.DataFrame(df_rows).to_csv("calc_spi_times.csv",index=False)

calc_spi_times.csv

anniegbryant commented 5 months ago

Thank you very much for adding this, @mesner! Very helpful, indeed :)

You bring up an interesting point about computation time taking ~2x as long doing each SPI piecemeal versus all at once, which was also my experience when I tried a similar analysis. I believe @olivercliff designed pyspi in a sort of hierarchical computation method, wherein some parent computations are performed for a given SPI group (e.g., transfer entropy, precision matrices) that then propagate to individual SPIs therein to save time/computation. So it's a bit tricky to derive the amount of time each individual SPI takes in practice, but I think this is a great approximation for users interested in the relative computation time for each SPI. For example, it makes sense that the convergent cross-mapping (ccm_) SPIs take orders of magnitude longer than most of the other SPIs.

For what it's worth, we played around with this question using different SPI subset configurations and multivariate time series (MTS) data sizes if you're interested: https://pyspi-toolkit.readthedocs.io/en/latest/faq.html#how-long-does-pyspi-take-to-run