ThomasYeoLab / CBIG

MIT License
586 stars 384 forks source link

Memory requirements for Yeo 2011 stability analysis? #19

Closed utooley closed 3 years ago

utooley commented 3 years ago

I am attempting to replicate the stability analysis with my own input data, but was not able to get it to finish running--wondering if it's a memory issue, or if I can do something to parallelize so that the analysis will finish? Or whether it's user error of some sort?

I ran CBIG_runresamplingK_single.m and am running CBIG_determineK_single.m (I gathered these are the scripts to replicate that analysis, please correct me if I'm wrong). It seems to hang most severely at the two lines running the Hungarian algorithm below, and I don't get past the first value of k, since I think the second Hungarian algorithm with random clusters never finishes. I have 37k vertices per hemisphere, so somewhat larger data than it was originally run on, but tried running with really large amounts of memory and wasn't successful--wondered how I could more effectively investigate the issue, or if you have any suggestions?

% Do the matching 

        [matching cost1] = Hungarian(-assign1*assign2');

        % Creating random cluster assignments

        r1r = rand(size(r1));
        rr1r=rand(size(rr1));

        maxr1r = max(r1r,[],2);
        arrign1_random = ~(r1r-maxr1r*ones(1,size(r1,2)));

        maxrr1r = max(rr1r,[],2);
        arrign2_random = ~(rr1r-maxrr1r*ones(1,size(r1,2)));

        [matching cost2] = Hungarian(-assign1_random*assign2_random');

Expected situation

Algorithm finishes with stability values.

Actual situation

Algorithm hangs at 'Creating random cluster assignments'.

Steps to reproduce the issue (optional)

rubykong commented 3 years ago

@utooley I am quoting the messages from Thomas' response to another user:

In general, I would recommend running more initializations if you want to go for more clusters. In Figure 7 of this paper (http://people.csail.mit.edu/ythomas/publications/2014SegregationOverlapNetworks-NeuroImage.pdf), we sorted the 1000 random initializations based on their likelihood values and compute their similarity with the best solution (highest likelihood). You can see that with 7 networks, among 40% of the random initializations converge to a solution that is basically the same as the best solution. If I recall correctly, with 17 networks, about 10% of the random solutions converge to a solution that is basically the same as the best solution.

The code for achieving above stability analysis can be found here: https://github.com/ThomasYeoLab/CBIG/blob/master/utilities/matlab/DSP/CBIG_VonmisesSeriesConsistencySurf.m

utooley commented 3 years ago

I see--I am doubtful that this is an issue with the number of initializations, though I could be wrong, because the hanging on Hungarian occurs at the first k, of only 2 clusters. I was able to successfully run the clustering algorithm itself, with a high number of initializations, with k=7 (and other ks), I was just having issues with the stability analysis code iterating over k=2-22. It's possible that I need to up the number of initializations in the resampling, though, I will try that.

I am attempting to replicate Fig 6 from Yeo et al., 2011 (J of Neurophysiology). Would you recommend using the CBIG_VonmisesSeriesConsistencySurf.m code you linked rather than the scripts I thought were used, CBIG_runresamplingK_single.m and CBIG_determineK_single.m? And iterating CBIG_VonmisesSeriesConsistencySurf.m over num_clusters? I think I could also just modify the determineK code to use the functions in ConsistencySurf.m, since I've already run the clustering at each k and saved it.

rubykong commented 3 years ago

@utooley Yes. To replicate figure 6 you can check CBIG_VonmisesSeriesConsistencySurf.m (I am not sure if you can use this code directly, this is the code Thomas wrote for his paper and we didn't modify too much to make it generalizable). The Hungarian matching should not use a lot of memory. If you do 17 networks, then the cost matrix is only a 17x17 matrix, which is super tiny, and supposed to be finished in seconds. You can test it by:

[debug_match, debug_cost] = Hungarian(rand(17,17));

If you check the CBIG_VonmisesSeriesConsistencySurf.m, it will also call a Hungarian matching algorithm: CBIG_HungarianClusterMatch, which actually uses a more efficient Hungarian matching code written by others called munkres.

utooley commented 3 years ago

Thank you for your rapid responses and for the help, I really appreciate it! I will give this a whirl! I have used the munkres Hungarian algorithm code before, and it was indeed really odd to me that the code was hanging at that line, since you confirmed my suspicions that it shouldn't take long at all.

rubykong commented 3 years ago

Nice! Let me know if it still doesn't work. : )