Reduce intermediate outputs

emdupre commented 6 years ago

From @emdupre on November 15, 2017 16:23

The niwrite function is called throughout tedana to output many intermediate files; however, these are poorly documented and of unclear value to the user. It would make more sense to only have these intermediate files output if the user provides a --verbose flag.

Copied from original issue: emdupre/tedana#2

emdupre commented 6 years ago

I don't like having an invalid label ! I just learned that was an option by accidentally applying it here 😆 This is still a very valid concern !

tsalo commented 6 years ago

I'd like to continue the discussion started in #133 regarding intermediate outputs and the possible addition of a new argument to control their generation here.

Here is a hopefully comprehensive list of files that are generated. I've taken the liberty of bolding files that I think should be optional.

Filename	Content
t2sv.nii	Limited estimated T2* 3D map.
s0v.nii	Limited S0 3D map.
ts_OC.nii	Optimally combined time series.
dn_ts_OC.nii	Denoised optimally combined time series.
lowk_ts_OC.nii	Combined time series from rejected components.
midk_ts_OC.nii	Combined time series from "mid-kappa" rejected components.
hik_ts_OC.nii	High-kappa time series. Combined time series from high-kappa accepted components.
mepca_mix.1D	Mixing matrix (component time series) from PCA decomposition.
mepca_OC_components.nii	PCA component weight maps.
comp_table_pca.txt	TEDPCA component table.
meica_mix.1D	Mixing matrix (component time series) from ICA decomposition. The only differences between this mixing matrix and the one above are that components may be sorted differently and signs of time series may be flipped.
comp_table_ica.txt	TEDICA component table.
betas_OC.nii	Full ICA coefficient feature set. Not normalized.
betas_hik_OC.nii	High-kappa ICA coefficient feature set
feats_OC2.nii	Z-normalized spatial component maps. Could be renamed to `meica_OC_components.nii`.

If `verbose` is set to True:

Filename	Content
t2ss.nii	Voxel-wise T2* estimates using ascending numbers of echoes, starting with 2.
s0vs.nii	Voxel-wise S0 estimates using ascending numbers of echoes, starting with 2.
t2svG.nii	Full T2* map/time series.
s0vG.nii	Full S0 map/time series.
__meica_mix.1D	Mixing matrix (component time series) from ICA decomposition.
hik_ts_e[echo].nii	High-Kappa time series for echo number `echo`
midk_ts_e[echo].nii	Mid-Kappa time series for echo number `echo`
lowk_ts_e[echo].nii	Low-Kappa time series for echo number `echo`
dn_ts_e[echo].nii	Denoised time series for echo number `echo`

If global signal correction is employed:

Filename	Content
T1gs.nii	Spatial global signal
glsig.1D	Time series of global signal from optimally combined data.
tsoc_orig.nii	Optimally combined time series with global signal retained.
tsoc_nogs.nii	Optimally combined time series with global signal removed. Same as ts_OC.nii when GSR is used.

If T1-GS correction is employed:

Filename	Content
sphis_hik.nii	T1-like effect
hik_ts_OC_T1c.nii	T1 corrected high-kappa time series by regression
dn_ts_OC_T1c.nii	T1 corrected denoised time series
betas_hik_OC_T1c.nii	T1-GS corrected high-kappa components
meica_mix_T1c.1D	T1-GS corrected mixing matrix

In order to binderize the walkthrough notebooks, we'll also need the following. All of these should be optional.

Filename	Content	Reason
adaptive_mask.nii	Adaptive mask. Each voxel has value corresponding to number of echoes with good signal.	Needed to show adaptive mask.
mask.nii	Binary mask of voxels with good data.	Applied to all other images needed for walkthrough.
tsoc_whitened.nii	Optimally combined data after dimensionality reduction with PCA.	Needed to show time series plot of whitened vs. original OC data (i.e., to show impact of TEDPCA).
meica_betas_catd.nii	Echo-specific weight maps for each of the ICA components.	Needed to show how the component weights align with predicted weights from S0 and R2 models in line plots.
meica_metric_weights.nii	Weight maps used to average metrics (R2 F, S0 F, predicted R2 model values, and predicted S0 model values) in the same manner as `fitmodels_direct`.	Needed to show how the component weights align with predicted weights from S0 and R2 models in line plots.
meica_R2_pred.nii	Echo-specific maps of predicted values for R2 model for each component.	Needed to show how the component weights align with predicted weights from R2 models in line plots.
meica_S0_pred.nii	Echo-specific maps of predicted values for S0 model for each component.	Needed to show how the component weights align with predicted weights from S0 models in line plots.

handwerkerd commented 6 years ago

I don't have time to comment on each file, but I wanted to point out some key things. meica_mix.1D & betas_OC.nii are critical to save because those are what you need to examine if the ICA components were selected appropriately and to selectively keep or remove components in different ways.

As of now, I think comp_table_ica.txt is where the province and selection metrics for each ICA component are stored. If this ends up being stored elsewhere, that's fine, but, until this, this is vital to save.

I have never used lowk_ts_OC.nii midk_ts_OC.nii & hik_ts_OC.nii for anything meaningful and those are very easy to regenerate if you have meica_mix.1D, betas_OC.nii, & comp_table_ica.txt

tsalo commented 6 years ago

@handwerkerd Thanks for the feedback.

I suppose this is a general question for everyone, but how much do we expect regular tedana users to examine and/or manually perform the component selection? To be honest, I assumed that that would be something only power users would do, and those users would have verbose set to True anyway. I assumed that regular users would probably just make a general pass/fail determination based on the the output and maybe the visual report.

We also want to generate visual reports, which should, at minimum, include the following: component time series, component maps, component statistics (Kappa, Rho, and variance explained). Should the report be, by default, in addition to related files (comp_table_pca.txt, comp_table_ica.txt, mepca_mmix.1D, meica_mmix.1D, betas_OC.nii, and feats_OC2.nii), or in lieu of those files?

I think it's worth it to keep the high-Kappa time series, but dropping low-Kappa and mid-Kappa makes sense. I don't want regular users to have to regenerate files, but if no one ever uses those files then there's no reason to keep them.

handwerkerd commented 6 years ago

Skimming the components & where they are classified his highly recommended for all users since odd things do happen. One of the top requests for help I get is from end users who see a component that is clearly misclassified & they want to know how to either add it back in or remove it. Also, the visual report is based on the component maps, so, if you're making a report, you're saving the maps.

I've slacked on setting up a mockup of the report, but I think it's good to have these information in formats that are easily access by programs. The viewing-friendly report will either copy some of that information into another format or access the files where that information resides.

There's really no end use application for the high kappa time series. They're sometimes useful to figure out what's going on in a weird dataset, but they shouldn't be used in analyses so I don't think there's a need to save them by default.

tsalo commented 6 years ago

High kappa can be optional too, I guess.

I've updated the tables so that the component tables, mixing matrices, and betas file are all required. Should we use betas_OC.nii or feats_OC2.nii? I honestly don't know the difference between the two.

jbteves commented 5 years ago

@tsalo @emdupre do you believe that some of the above discussion should be updated in light of @dowdlelt's --png option? Especially since @handwerkerd's comment notes that things shouldn't be optional since you want to examine your components using them?

tsalo commented 5 years ago

I think that our current set of optional (via verbose) and required outputs is good, although this issue may not accurately reflect them at this point. I think we can probably close this issue, to be honest. We've moved a lot of intermediate outputs over to verbose, which should address the original request.

emdupre commented 5 years ago

I'm ok to close this and create a new issue with a more specific request !

ME-ICA / tedana