ME-ICA / tedana-reliability-analysis

An analysis of the reliability of the tedana denoising pipeline on an example dataset
GNU General Public License v3.0
4 stars 4 forks source link

Method for identifying comparable components across runs #3

Open tsalo opened 5 years ago

tsalo commented 5 years ago

One of our methods of evaluating reliability will be to compare ICA components across random seeds. From this we can look at the impact of convergence on the results and consistency of classification for equivalent components. I'm trying to figure out how we should do this.

Here are some proposed steps with potential pros/cons:

  1. (Prerequisite) Run tedana with two seeds.
  2. Load ICA mixing matrix and ICA component table from each run. These will have the components sorted in the same order (descending Kappa, I believe).
  3. Correlate mixing matrices across the two runs, resulting in an n_comps X n_comps correlation matrix.
  4. For each row in correlation matrix, identify index of maximum correlation coefficient.
    • Under optimal circumstances, this index would have each column represented once, with no duplicates. In reality, that does not seem to happen (see the correlation matrix I've added below). As you can see below, the extremely high correlations (yellow squares) sort of disappear further down.
    • How do we resolve duplicates, where a given component's highest correlation from one run is with more than one component from the other run?
  5. To compare between convergence and non-convergence, compare distributions of these maximum correlation coefficients from converged/converged run pairs to converged/didn't-converge pairs.
    • We'll get an n_comps array of correlation coefficients from each pair, so to compare across all runs we'll need to use the full distributions.
    • As with all comparisons of convergence, a problem we'll have to deal with is that convergence failure doesn't happen randomly. Some subjects fail a lot of the time, while others never fail.
  6. To evaluate consistency of classification, we'll need some metric summarizing cross-run comparability of components. Then we can build a contingency table (see example below) for each pair of runs, and can look at the average of that across all runs, I think.
    • We still have the duplicates issue here.

Example correlation matrix from real data

example_correlation_matrix

Example confusion matrix

Note that I'm ignoring the duplicates issue described above. That means that 8 components in run2 are reflected 2-3 times below, and 10 components are not reflected at all.

run1/run2 accepted ignored rejected
accepted 40 10 8
ignored 0 0 1
rejected 4 1 8
tsalo commented 5 years ago

I also looked at correlations between the beta maps as well, as a substitute to or in conjunction with the correlations between the time series, but that doesn't do anything to reduce duplicates in the test runs I'm using.

jbteves commented 5 years ago

One thing to note as I think about this is that if a component correlates highly with several other components, it seems likely those several components are not actually independent anymore, so when this happens we are in a sense failing to create truly independent components. When this occurs, this should be regarded as an undesirable ICA behavior (I'm reluctant to call it an outright failure of the ICA). However, the threshold where we decide that something is too highly correlated is a little bit tricky in the absence of the data itself. I think we will have to take a data set and inspect manually to see if there are scenarios where components might actually be independent but still have high correlation. What data set is the above example?