CoBrALab / RABIES

fMRI preprocessing pipeline and analysis tools adapted for rodent images. Visit the full documentation at https://rabies.readthedocs.io/en/stable/
Other
34 stars 14 forks source link

fastcommonspace should be default + add -subject and -session filter flags like in fmriprep #283

Closed grandjeanlab closed 1 year ago

grandjeanlab commented 1 year ago

The default (as of 0.4.7) is to use twolevel_ants_dbm to create a study template. As far as I can tell, the study template is created by registering linearly and non-linearly (twice) every anatomical scan to each other. I think the current workflow breaks RABIES.

1) twolevel_ants_dbm scales non-linearly and leads to impossible computing time.

Assuming each registration within twolevel_ants_dbm takes 2h to run per scan pair, and it takes an additional 4h to run the remaining rabies steps per scan, and (for simplicity) that compute load is divided linearly by the number of core, It only takes ~30 scans to exceed Niagara's 80 core node 24h max wall time. It would take 1/2 year for a 10-core computer to run a 150 scan dataset. In comparison, fastcommonspace runs 30 scans in 4h on Niagara and 12h on a 10-core equivalent. It'd take ~600 scans to max Niagara's wall time with fastcommonspace.

image

2) fastcommonspace just works. in the >800 rat scans I processed, no scan failed the anatomical - template registration using fastcommonspace. This requires of course relatively good masks/inho correction that can be achieved if we have parameter flexibility.

3) twolevel_ants_dbm makes RABIES inflexible. twolevel_ants_dbm requires processing the whole dataset as a whole. Assuming that a non-null set of scans will only pass preprocessing with a given RABIES parameter set (e.g. N4 inho cor) and another set of scans will fail with these same parameters, and our goal is to minimize scan exclusion (because it reduces sample size, etc...), it becomes impossible to process the dataset as a whole. I strongly encourage a flexible workflow with -subject, -session, and -run filter flags a-la-fmriprep to allow running RABIES on a subset of scans with adaptive parameters. Moreover, this would allow running rabies in batches as scans are added to a dataset, thus rendering long compute time moot.

gdevenyi commented 1 year ago

twolevel_ants_dbm is not used.

the study template is created by registering linearly and non-linearly (twice) every anatomical scan to each other.

Incorrect. The registration is between each subject and the evolving template average.

The tool used is https://github.com/CoBrALab/optimized_antsMultivariateTemplateConstruction

This is a fundamental issue with using singularity, where proper use of parallel computing clusters isn't possible.

Are you using an appropriate resolution for the the reference template? Non linear registration processing scales with resolution.

There are likely more sensible defaults for template construction as the defaults mostly reproduce the original ants defaults which are not necessarily needed for this.

twolevel_ants_dbm makes RABIES inflexible. twolevel_ants_dbm requires processing the whole dataset as a whole.

Direct registration of subject scans to a common space is a major source of bias in studies. I have a draft paper proving it.

grandjeanlab commented 1 year ago

This is a fundamental issue with using singularity, where proper use of parallel computing clusters isn't possible. So that is the reason why I get impossibly long processing time. Irrespective. That is an issue for super-large datasets and people using singularity (which is more common on hpc than dockers). I fall under that category. I suspect many others will. This is something we need to consider when making sensible recommendations in the protocol. paper.

Still, it doesn't address the issue of inflexibility. If we accept that some scans can only work with some parameters, while others not within the same dataset, we create a situation where the researcher needs to pick one or the other workflow, at the expense of a subset of discards.

Gab-D-G commented 1 year ago

Hi,

First, I would expend on two technical points:

  1. I would like to expand on the computational load of commonspace registration. I believe it should be scaling linearly, not non-linearly. How were the curves you are showing determined? In my experience, the commonspace registration steps never took more than a few hours on niagara. For instance, in a previous log involving commonspace registration with ~60 scans, with decent resolution of 0.15mm isotropic, the entire preprocessing took 4h to run on niagara with a single computing node. In another instance, also processed at isotropic resolution of 0.15mm isotropic, a template generation step was run across ~200 EPI scans with --bold_robust_inho_cor (the option also uses optimized_antsMultivariateTemplateConstruction), and the complete preprocessing (including also the normal commonspace registration with structural scans) took 22h with a single niagara computing node. This is very far off from your predicted curve.

  2. A key question that needs to be addressed is how well fastcommonspace works compared to the unbiased template method. @gdevenyi developed the template construction to mitigate registration biases inflating inter-scan variability. We’d need to investigate further the impact of using fastcommonspace against the robust registration framework before we consider making it into a default. I would like more details on the tradeoffs of manually tweaking the registration of single scans, and what does it mean that fastcommonspace works across all scans, because if working registration means replacing the non-linear registration with a rigid one, we can expect the introduction of important misalignments across scans compared to a fully functional framework.

With that said, the current framework definitely makes RABIES inflexible, and makes it challenging to iteratively adjust registration parameters to improve performance. In my experience, this is not a great issue when dealing with small to moderate sample sizes (15-50 EPI scans), but can become more challenging with larger datasets or with ongoing data acquisition which requires processing new scans at different experimental stages.

Perhaps the ideal tradeoff could be achieved if an unbiased template is generated from a representative subsample of the dataset (which could likely be achieved with 20-30 scans, and would not required 100% registration success), and the entire dataset is then registered onto the unbiased template in a fastcommonspace fashion. This may be sufficient, since in my experience the quality of the template changes little after a sufficient number of scans contribute to the average. A software update which could achieve this purpose (and was already suggested https://github.com/CoBrALab/RABIES/issues/222) would be to allow the use of an unbiased template generated from a previous RABIES run. This would allow to generate a decent template from a set of scans in a first RABIES run, and then preprocess additional scans separately. We could achieve the following:

  1. minimize computational load of the template generation if it is an issue (although this should only be the case for very large datasets)
  2. process newly acquired scans separately without re-running the entire dataset
  3. re-process separately scans which failed registration
  4. circumvent the parallelization issue from singularity by executing a set of containers calls, each handling a subset of the dataset. However, if this partial template construction was done in a study, it would need to be properly documented and reported, as it is not necessarily equivalent to generating a fully unbiased template from the entire dataset.

Finally, regarding best recommendations we should put forward for the community, and what should constitute the default for RABIES, as it stands I believe the robust registration framework with the unbiased template should be the baseline (at least with a template derived from a subset of the dataset). Unless demonstrated otherwise, fastcommonspace or the tweaking of registration parameters on a per scan level may introduce biases affecting downstream analysis. It doesn’t mean that this should not be possible (hence those options are available with RABIES), but the user should be aware of the potential consequences if going down that road.

grandjeanlab commented 1 year ago

the problem is not if RABIES default workflow breaks with increasing subjects, but when. I'm going for >100 scans datasets. Others eventually will. Moreover, not everyone will have access to 80-core nodes, and not every single user can be granted access to niagara (AFAIK).

I would advocate for a commonspace template generation step detached from preprocessing. This would allow preprocessing, confound and analysis to be run with -subject, -session, and -run filter flags, pretty much how fmriprep does now.

as for commonspace vs fastcommonspace. as I said, I had pretty good registrations always with fastcommonspace with >800 scans. The commonspace step is a legacy of the structural MRI procedure for volume analysis, but it is not necessary in my opinion.

Gab-D-G commented 1 year ago

With recent updates, there are now options for selecting a specific set of scans for each pipeline stage, thus allowing to process a subset of a dataset if needed, and there is now the possibility to inherit a previously generated unbiased template. It is thus possible to create an unbiased template with a manageable sample size (e.g. ~50) sufficient to obtain a robust dataset-specific template, and then run fast_commonspace -style registration on the rest of the data.