Open tsalo opened 1 week ago
I would say you can expect the robust number of components to stay consistent across an infinite number of runs. I don't know how long it takes to run tedana with robustica, but 5-10 runs should be enough in my opinion.
@tsalo and I just discussed this at the end of our developers' call. It would be useful to run a bunch of options on a few datasets to test/show that are defaults are reasonable. This would be better to do after we add the index quality for each component and the TSNE quality results to our outputs so that we have data to show using tools that others can also run themselves.
As for --n-robust-runs
, my understanding is that @BahmanTahayori tried various options and found that 30 gave stable results such that the final components were very similar if robustICA was rerun with a different seed and that more iterations didn't noticable improve the quality of the results.
I ran robustICA on 20 subjects from our dataset with --n_robust_runs
set to [10 20 30 ... 100] For most subjects, the detected active volume plateaued at 20 runs, though a few required more (30). Thus, I set the default number of runs to 30. However, depending on the quality of the dataset, additional runs may be required. I will work on adding TSNE plots to the code which should help the user make more informed decisions about the number of runs.
Summary
We have some info on robustica in the FAQ, but it's not clear what folks should use for
--n-robust-runs
(and--maxit
I suppose). My big question is whether selecting the number of robust runs is just a matter of resources and time (i.e., should you use the highest number you're willing to wait for) or if there's a sweet spot (i.e., increasing the number runs will result in fewer consistent components, which means that more is not necessarily better).If we don't actually know the answer to this, I could try running tedana on a single run with different
--n-robust-runs
values to see how the parameter impacts the number of detected components.