Open olivercliff opened 2 years ago
Just wanted to echo this -- presented pyspi at CNS2022 and received questions about approx how long each SPI takes so users can estimate time requirements for a job
Could check whether the preprocessed information is ~fast and thus could be neglected for an initial estimate. If so, this could be straightforward to benchmark on a range of simple VAR processes (for # processes and # time points).
For posterity, as I'm sure no one else cares after 18 months.
Here's a code snipped I used for the same question.
Yes, it's hack, and the total time is about 2x what it takes to calculate them all at once (IIRC).
But, it's something.
Note that some spi's fail in my gpu-less linux env.
import numpy as np
import pandas as pd
import random
import time
from pyspi.calculator import Calculator
random.seed(42)
M = 2 # 5 processes
T = 300 # 500 observations
dataset = np.random.randn(M,T)
calc = Calculator(dataset=dataset)
spi_items = calc.spis.copy()
df_rows = []
for (k,v) in spi_items.items():
calc.spis.clear()
calc.spis[k] = v
begTime = time.perf_counter()
calc.compute()
calcTime = time.perf_counter() - begTime
df_rows.append(dict(spi=k, time=calcTime))
pd.DataFrame(df_rows).to_csv("calc_spi_times.csv",index=False)
Thank you very much for adding this, @mesner! Very helpful, indeed :)
You bring up an interesting point about computation time taking ~2x as long doing each SPI piecemeal versus all at once, which was also my experience when I tried a similar analysis. I believe @olivercliff designed pyspi in a sort of hierarchical computation method, wherein some parent computations are performed for a given SPI group (e.g., transfer entropy, precision matrices) that then propagate to individual SPIs therein to save time/computation. So it's a bit tricky to derive the amount of time each individual SPI takes in practice, but I think this is a great approximation for users interested in the relative computation time for each SPI. For example, it makes sense that the convergent cross-mapping (ccm_) SPIs take orders of magnitude longer than most of the other SPIs.
For what it's worth, we played around with this question using different SPI subset configurations and multivariate time series (MTS) data sizes if you're interested: https://pyspi-toolkit.readthedocs.io/en/latest/faq.html#how-long-does-pyspi-take-to-run
Will be useful for knowing which methods are fast/slow to compute and allows users to select faster options.
This might be finicky since many of the methods inherit preprocessed information from other methods (e.g., all spectral methods inherit spectral decompositions).