Metrics of denoising performance

tsalo commented 5 years ago

The goal of this analysis is to determine which settings best denoise multi-echo data. We'll need good metrics of this performance, which we can probably take from other papers. Broadly, each of these metrics can be calculated for each functional run for using the single-echo (~30 ms), combined, and denoised data, and the distributions can be compared across strategies.

We can break down our metrics into two groups: removal of noise and preservation of signal.

Removal of noise metrics:

Distance-dependent motion-related artifacts
- I have a repository where I've attempted to implement Power's analyses. The results aren't always as clear as I'd like, but I think the code works correctly. Here is the link.
- Interpretation: Failure to eliminate distance-dependent motion-related artifacts indicates poor removal of motion-related noise.
Component classifications for components highly correlated with the following:
- Motion parameters (should be rejected)
- CompCor regressors (should be rejected)
- Interpretation: Ability to detect artifactual or task-related components in the absence of external information would indicate good performance of the component classifier.
Component classifications for components automatically flagged as noise using AROMA.
- I'm thinking about the edge mask-related metric specifically.
- The frequency-based metric is less relevant, in my opinion, since one of the benefits to multi-echo is supposed to be that we don't need band-pass filtering.
Component classifications for components visually identified as artifacts.
- Interpretation: Ability to detect artifactual or task-related components in the absence of manual intervention would indicate good performance of the component classifier.

Preservation of signal metrics:

Power analysis of task data
- For well-characterized tasks, we can define a priori regions of interest and run power analyses on the model results with fMRIPower.
- Interpretation: Improved power to detect well-known effects for denoised data compared to combined or single-echo data would indicate improved denoising without removing signal.
TSNR
- Temporal signal-to-noise ratio values can be compared on a voxel-wise basis.
- Caveat: TSNR will increase as degrees of freedom decrease, so denoised data will necessarily have higher TSNR even if denoising is bad.
Contrast-to-noise ratio maps
Activation count maps
- I just really love these maps
- Do we care about alignment with underlying anatomy (like fMRIPrep) or total voxel count?
Parameter estimates
- Variability across subjects?
- Value height?
- Test statistic height?
Component classifications for components highly correlated with the following:
- Convolved task regressors (should be accepted)
- Interpretation: Ability to detect artifactual or task-related components in the absence of external information would indicate good performance of the component classifier.

dowdlelt commented 5 years ago

Would this all be relative to just the optimally combined data, and perhaps the ~30ms (at 3T) echo? I like all of these. Could also think about the seed connectivity maps, or ICC values.

The task analyses are a critical component, because we have to be sure that tedana isn't removing BOLD-like signals, which it has done in the 'deep' past.

tsalo commented 5 years ago

I think comparing to both OC and ~30ms is a great idea.

When we analyze a dataset with a relatively large number of echoes (i.e., five, realistically), we could also run the analyses with various numbers of echoes included to predict how number of echoes impacts power and other metrics of interest. That would be a lot of work, but it might be worth it.

tsalo commented 5 years ago

I just want to link to this comment in ME-ICA/tedana#153. The work done by @cjl2007 to improve his own component selection could be used here to evaluate tedana's performance. I believe that the evaluation of component classifications betters fits with this analysis than the reliability analysis.

handwerkerd commented 5 years ago

Additional metrics that were discussed at OHBM2019: Contrast-to-noise for runs with task data. In Regions of Interest pre-specified for each data set? Give more thought to what we mean by TSNR. Mean/Standard Devision in whole brain? white matter? ventricles? ROIs?

tsalo commented 4 years ago

Given our renewed interest in getting a paper out, I'd like to revisit this issue. I tried to summarize the metrics a bit more. There's probably a lot of overlap between some (e.g., power analysis and parameter estimates) and some are probably not useful (e.g., TSNR). Plus I don't think the list is comprehensive.

ME-ICA / tedana-comparison

Metrics of denoising performance #9