Open tsalo opened 5 years ago
I also looked at correlations between the beta maps as well, as a substitute to or in conjunction with the correlations between the time series, but that doesn't do anything to reduce duplicates in the test runs I'm using.
One thing to note as I think about this is that if a component correlates highly with several other components, it seems likely those several components are not actually independent anymore, so when this happens we are in a sense failing to create truly independent components. When this occurs, this should be regarded as an undesirable ICA behavior (I'm reluctant to call it an outright failure of the ICA). However, the threshold where we decide that something is too highly correlated is a little bit tricky in the absence of the data itself. I think we will have to take a data set and inspect manually to see if there are scenarios where components might actually be independent but still have high correlation. What data set is the above example?
One of our methods of evaluating reliability will be to compare ICA components across random seeds. From this we can look at the impact of convergence on the results and consistency of classification for equivalent components. I'm trying to figure out how we should do this.
Here are some proposed steps with potential pros/cons:
n_comps
Xn_comps
correlation matrix.n_comps
array of correlation coefficients from each pair, so to compare across all runs we'll need to use the full distributions.Example correlation matrix from real data
Example confusion matrix
Note that I'm ignoring the duplicates issue described above. That means that 8 components in run2 are reflected 2-3 times below, and 10 components are not reflected at all.