Feature request: compute a connectivity matrix from all BOLD runs/sessions

victoris93 commented 1 week ago

Your idea

For many connectivity analyses it is advised that one maximizes time series length, so one would expect to be able to use all BOLD data to compute a connectivity matrix (e.g., a classic case of a single matrix per subject). Any chance you guys would consider an option to concatenate all runs or sessions before computing a matrix?

htwangtw commented 1 week ago

Thanks for the suggestion! We had some internal suggestions and never really reached a conclusion on how should different runs / sessions handled. A subject level summary sounds very reasonable. This is indeed a common use case and I have done something similar myself. In terms of implementation, I will probably calculate the connectivity metrics for each scan and then calculated the average. This is more memory efficient and in principle the same. I am happy to show it with a minimal example if you wish!

victoris93 commented 1 week ago

I've thought about taking the average of run-specific matrices myself, too, but I wasn't sure if from the methods standpoint it's equivalent to computing a single matrix from all runs concatenated. But I think if at some point you decided to implement it, it would be interesting to be able to:

concatenate all runs and sessions
concatenate all runs within session (e.g., if one wanted to assess subject discriminability across sessions)

If it's not at the top of the agenda, I understand. Just thought I'd leave it here for future reference.

htwangtw commented 6 days ago

The original reason that I decide to not implement an group connectome are

this is relatively simple for researchers to compute on their own.
it adds way too many small customisation and deviate away from the aim of having a simple tool, CLI will become bulky if they are exposed as options to users
the outputs will be bulky if all possible ways of combining data are created.

If anything along these lines are implemented, I would prefer to have on thing that's useful for more researchers, rather than all possible combinations.

Also for averaging vs concatenating, IMHO this is a necessary compromise for efficient computing. The results are not going to be numerically identical, but the similarity is extremely high (see code below). Concatenating all the time series will lead to inefficient RAM usage. For a bulk of the computing time the RAM usage would be low, and on a computing cluster with scheduler, this is the behaviour that would get user priority punished. Along the same vine, concatenation will not scale for longer scans.

Following is the code to show the the averaging approach and the concatenating approach will create pretty similar results:

import numpy as np

time_series = [np.random.rand(100, 444) for _ in range(5)]  # time series length of 100, typical for short scans, 444 parcels
connectome_concat = np.corrcoef(np.concatenate(time_series).T)

connectome_average = np.mean([np.corrcoef(ts.T) for ts in time_series], axis=0)

similarity = np.corrcoef(connectome_concat.flatten(), connectome_average.flatten())[0, 1]

assert similarity > 0.99

victoris93 commented 6 days ago

Ok I see, this sounds totally reasonable! Thanks.

bids-apps / giga_connectome

Feature request: compute a connectivity matrix from all BOLD runs/sessions #176

Your idea