Open victoris93 opened 1 week ago
Thanks for the suggestion! We had some internal suggestions and never really reached a conclusion on how should different runs / sessions handled. A subject level summary sounds very reasonable. This is indeed a common use case and I have done something similar myself. In terms of implementation, I will probably calculate the connectivity metrics for each scan and then calculated the average. This is more memory efficient and in principle the same. I am happy to show it with a minimal example if you wish!
I've thought about taking the average of run-specific matrices myself, too, but I wasn't sure if from the methods standpoint it's equivalent to computing a single matrix from all runs concatenated. But I think if at some point you decided to implement it, it would be interesting to be able to:
If it's not at the top of the agenda, I understand. Just thought I'd leave it here for future reference.
The original reason that I decide to not implement an group connectome are
If anything along these lines are implemented, I would prefer to have on thing that's useful for more researchers, rather than all possible combinations.
Also for averaging vs concatenating, IMHO this is a necessary compromise for efficient computing. The results are not going to be numerically identical, but the similarity is extremely high (see code below). Concatenating all the time series will lead to inefficient RAM usage. For a bulk of the computing time the RAM usage would be low, and on a computing cluster with scheduler, this is the behaviour that would get user priority punished. Along the same vine, concatenation will not scale for longer scans.
Following is the code to show the the averaging approach and the concatenating approach will create pretty similar results:
import numpy as np
time_series = [np.random.rand(100, 444) for _ in range(5)] # time series length of 100, typical for short scans, 444 parcels
connectome_concat = np.corrcoef(np.concatenate(time_series).T)
connectome_average = np.mean([np.corrcoef(ts.T) for ts in time_series], axis=0)
similarity = np.corrcoef(connectome_concat.flatten(), connectome_average.flatten())[0, 1]
assert similarity > 0.99
Ok I see, this sounds totally reasonable! Thanks.
Your idea
For many connectivity analyses it is advised that one maximizes time series length, so one would expect to be able to use all BOLD data to compute a connectivity matrix (e.g., a classic case of a single matrix per subject). Any chance you guys would consider an option to concatenate all runs or sessions before computing a matrix?