Slow down development and revisit package structure

Hi all! Sorry I've been a bit radio silent, but I have been watching the development of tedana over the past few months and it's been great! There's a whole bunch of new features and stuff that look really cool and I'm excited to try them out. I'm wondering, though, if we might slow down new development a bit and focus some on refactoring the package?

That is, coming back to tedana after some time away, it took me a not-insignificant amount of time to remember what functions were where and how they all interconnected. It's still sprawling, and I think that, given the trajectory of the package (i.e., it's not going to be anything near the size of e.g., nilearn or nistats), it might be good to consolidate things a bit.

To that end, would something like the following be more parsimonious / understandable to new contributors, without sacrificing modularity?

tedana/
├── __init__.py
├── combine.py                  (formerly model/combine.py)
├── decay.py                    (formerly model/monoexponential.py)
├── decompositions/
│   ├── __init__.py
│   ├── utils.py
│   └── eigendecomp.py
├── fit.py                      (formerly model/fit.py)
├── io.py                       (formerly utils/io.py)
├── selcomps.py                 (formerly selection/*)
├── utils.py                    (formerly utils/utils.py)
├── cli/                        (formerly workflows/)
    ├── __init__.py
    ├── t2smap.py
    └── tedana.py
└── tests/

In my mind, these would have the following:

`combine.py`

Functions for creating the optimal combination time series.

Ideally the functions here would also accept file inputs, in addition to concatenated data inputs. Pre-generation of a T2* map (via decay.py functionality) would be optional, so that users interested in the optimal combination could call this function providing only their echo files and echo times and get back an in-memory niimg-like object of the optimal combination time series.

`decay.py`

Functions for modeling signal decay and generating T2* maps.

This is more-or-less a carbon copy of the existing model/monoexponential.py right now, but renamed. New functions for modeling signal decay (i.e., voxel-wise fits) could be added here.

`decompositions/`

Functions for decomposing time series.

Given that this is the primary module that might be expanded to accommodate other decomposition techniques, I would be comfortable keeping this as-is.

`fit.py`

Functions for fitting R2* and S0 models.

Copy of model/fit.py, but in the general package namespace.

`io.py`

Functions for handling reading and writing data, including consolidating components into relevant time series.

This is currently a bit of a mess right now. The I/O files for tedana are scattered throughout the module, so it would be good to consolidate and clean them up.

`selcomps.py`

The almost 900-line function we all know and love (🙄), plus corresponding utilities.

I know there's a significant amount of work going in to making this more modular and broken up, so I think that will help a lot with its interpretation. I'm not sure I see the utility, though, in it being within a module (i.e., tedana.selection, as it is now).

`utils.py`

Miscellaneous utility functions supporting the general package (e.g., new_nii_like(), filewrite(), 'load_data()).

`cli/`

Command line interfaces for tedana.

It might be good to pare these down a bit so that they're really only handling the CLI logic (e.g., parsing, etc.), and not the entire workflow as they currently are. The workflows could potentially go in a separate file (workflows.py), but I think that's a point for further discussion.

Sorry for the incredibly long issue, and for just jumping in with pretty major suggestions. Let me know your thoughts.

I think that a lot of the proposed changes move tedana away from being able to incorporate new methods. For example, having selection as a module makes it easier to incorporate different approaches. At minimum, we want to support both the v2.5 and v3.2 of the Kundu approach, but I think that, if we want to support other kinds of decompositions, it only makes sense that there would also be other ways of calculating component metrics and selecting components.

Some other random thoughts:

decay seems like a much better name than monoexponential. Very much for this.
io and utils really do need to be cleaned up, and I guess there's no real reason to keep them both in utils/.
I think it's easier to track what changes need to be made if we keep the argument parser and the workflow in a single file, rather than in two separate modules, but I am not particularly invested in this organization.
decomposition: I've actually been thinking that we should reorganize this module. tedpca includes the PCA decomposition, metric calculation (fitmodels_direct), and the component selection, while tedica just includes the ICA decomposition. The other steps for MEICA (metric calculation and component selection) are called in the workflow and in selcomps. It might be good to make decomposition just the PCA and ICA decompositions, while the metric calculation and component selection portions for PCA could be treated the same way they are for ICA. Something like this, maybe:

tedana/
├── decay.py
├── combine.py
├── decompose/
    ├── pca.py
        └── run_pca()
    └── ica.py
        └── run_ica()
├── fit.py  # calculation of betas from fitmodels_direct
├── metrics.py  # the metric computation from fitmodels_direct
├── select/
    ├── tedpca.py
        ├── decision_tree()  # but with a better name
        └── varthresh()  # but with a better name
    └── tedica.py
        ├── kundu_v2()
        └── kundu_v3()
├── postproc/  # but with a better name
    ├── godec.py
    └── t1gsr.py
├── workflows/
    ├── tedana.py
    └── t2smap.py
├── io.py
└── utils.py

I'm not sure how many selection methods we should incorporate at this point, given that almost all of them except for kundu_v2() are basically untested...

Ignoring that for the moment, I'm not sure I agree with separating e.g., decompose/pca from select/tedpca. The former is essentially 5-10 lines of code, and the latter is wholly dependent on it. Moreover, I can't think of an instance where I'd run, say, decompose/pca and then feed it into anything but select/tedpca, so I'm not sure the modularity there is necessary. Could the select module be subsumed by decompose, so that there's less distances between functions that are dependent on one another? decompose would then be the one module that's designed to "grow" as new techniques are added.

I agree that editing workflows and parsers in the same file is convenient, so I'm alright with that. One thing I do think it might be nice to work out is how to make the workflows a bit more modular—as it stands, if I do want to substitute out a new decomposition technique there's no easy way to do that without copying the workflow code and changing that single line. It's obviously not nearly as monolithic as it used to be, but it's still quite a bit. I don't have an easy solution for this, but I think it's something to think about as we consider these changes.

tedana/
├── combine.py
├── decay.py
├── decompose/
    ├── pca.py
        ├── run_pca()
        ├── var_thresh()
        └── decision_tree()
    └── ica.py
        ├── run_ica()
        ├── kundu_v2()
        └── kundu_v3()
├── fit.py
├── metrics.py
├── postproc/
    ├── godec.py
    └── t1gsr.py
├── workflows/
    ├── tedana.py
    └── t2smap.py
├── io.py
└── utils.py

Heya - just a little note from me to say that I think the modularity is a really key feature for tedana:tada:.

If I can help with drawing out some additional documentation, some diagrams to show how someone would be able to slot various parts of tedana into their analysis pipelines I'm very happy to do what I can (I'd need a buddy but I can at least provide an outsider's view on the descriptions).

I also think those diagrams could show people how they can insert additional selection methods. The core developers don't have to write all the modules - you can welcome external collaborators to extend the package once the documentation and layout is set up to be really easy to extend.

In general I really like the idea of consolidating the updates to the packages and getting them super solid. I think this is one of those "feels like slowing down but is actually faster in the long run" development moments 😸

I definitely agree that the modularity of tedana is key! I think it's important to strike a balance between modularity and interpretability, though, and in that case tedana is a tough nut. Many of the functions in the package are highly interdependent on one another, so splitting them up too much means having to jump from file to file when looking for the next step in the workflow. On the other hand, if they're all put together then they become uninterpretable.

I absolutely think that the end goal should be to allow extension and growth, so it's important to find the right balance between building a solid, understandable core and ensuring extensibility. @KirstieJane, I think having some sort of (pictorial) documentation about how the project is laid out, rather than hoping that it speaks for itself, would be really beneficial! I'm not totally sure what it would look like but would be happy to discuss it further.

@tsalo, do you have thoughts on the most recent proposal? Are you pretty set on retaining separate modules for decompose and select? Let us know!

I'm also going to explicitly ping @emdupre and @handwerkerd because I think it'd be good to have as many people chime in on this as possible!

I think merging select into decompose makes a fair amount of sense given how small the PCA/ICA functions are, but it means having the workflow go decay --> combine --> decompose.pca.run_pca --> fit --> metrics --> decompose.pca.var_thresh --> decompose.ica.run_ica --> fit --> metrics --> decompose.ica.kundu_v2 --> postproc.t1gsr. The thing that's a little weird in that workflow is going into decompose, then jumping out to fit and metrics, and then going back into decompose, for both PCA and ICA.

Maybe it would make more sense to move fitmodels_direct into decompose and to keep select separate? Then, fitmodels_direct could be a private function that tedpca and tedica call, instead of calling it in both the workflow (for ICA) and tedpca (for PCA). The difficulty is in properly separating the general model from what could be method-specific metrics. It's hard to know what other metrics could be derived from the data, since that's not something I know much about.

Hi everyone,

Sorry for my own radio silence here ! Thanks so much for bring up this point @rmarkello and for kickstarting what is an admittedly overdue discussion. I have a few small points I'd like to quickly make, but I think the much bigger point here is that we need to re-focus our roadmap.

Re @KirstieJane's suggestion of a diagram: I think this would be phenomenally useful, and something @tsalo actually started on -- you can see his draft here. I know that he was looking to add this into our RTD site in #133, but it'd be great to think about how we can distribute this in a way where it's as easy as possible to edit and extend !

Re organizing of modules: it sounds like there's at least consensus on utils.py, io.py and decay.py. Those should be easy enough to implement, and I'll open an issue specifically for re-organizing those modules so that we can continue the broader discussion without forgetting to make those changes ! If that's not your understanding @everyone, please let me know :smile:

Ok, on to the broader point.

Re-focusing our roadmap

In laying out a roadmap for tedana, I'd like to structure the discussion around two key areas where I think we need to reach some consensus; namely, our project vision and our metrics of success. That is, "what is the problem that we believe tedana is positioned to uniquely solve?" and "how will we know that we have succeeded in solving that problem?". I'm including my own understanding of these here, and I'd love to hear your feedback.

Project Vision

ME-EPI is not well integrated into major preprocessing packages, yielding duplicated and unmaintained code. tedana will serve as a central repository for standard ME-EPI denoising as well as a testing ground for novel ME-EPI denoising methods. This will jointly reduce the external burden on pipeline maintainers, facilitate increased ME-EPI adoption, and enable future development in ME-EPI denoising.

Metric of Success

We will know that we have been successful in creating tedana when we have succeeded in providing the following concrete deliverables:

Currently, afni_proc.py distributes an older version of tedana, which they have built a helper, tedana_wrapper.py script around to ensure compatibility. Users of afni_proc.py at this point are therefore (by default) not accessing the latest version of tedana. One metric of success, therefore, will be if we can demonstrate sufficient stability and support for tedana such that the afni_proc.py maintainers are willing to switch to pip install tedana as the recommended method of accessing ME-EPI denoising in AFNI.
Currently, users of fMRIPrep for ME-EPI pre-processing are encouraged to first run fMRIPrep and then tedana. This has proven frustrating for users interested in only obtaining the optimal combination— a relatively common use case. Perhaps even more problematically, when using ME-EPI denoising it has proven to fail for a subset of subjects, with only one or two components are returned as accepted in some cases. Another metric of success, then, will be fully integrating tedana into fMRIPrep via the merge of this pull request.
One long-standing concern with ME-EPI denoising has been the availability of documentation for the method outside of published scientific papers. To address this, we have created a ReadTheDocs site; however, there are still several sections either explicitly marked as "# TODO" or otherwise missing crucial information. A metric of success for tedana then, should be a completed RTD site, that provides the minimum necessary information to orient a complete newcomer to ME-EPI on both the theoretical basis of the method, as well as the practical steps used in denoising.
There is also, of course, the need to improve on the denoising method itself. We have discussed this in issues including #97 and #101, and I think this is where most of the concern around the module organization comes from. In my mind, a metric of success here would be EITHER integrating a new decomposition method, beyond ICA OR validating the new selection criteria.

Overall, then, I see that the majority of work on the project should go towards making it stable and well-documented, with enough modularity to integrate improvements (plus one improvement, as a proof of concept). My dream would be that we would have a fifth metric of success:

An outside contributor integrates an improvement to multi-echo denoising.

In thinking through these ideas, I've gone ahead in our 0.1 release project and re-organized the issues to roughly match the suggestions listed above, but I'm very anxious to hear what you all think. I'm also happy to schedule a call where we can discuss this, but I thought it'd be nice to start with my own thoughts, here.

Thanks so much to everyone for all of your time and energy in thinking through this— I'm really excited about our next steps.

I love this road map. It makes total sense to me. Are you planning to raise an issue specifically for it or add it to the RTD site? I think it could be helpful to have in the contributing guidelines.

Regarding the Project Vision, it seems like we need to better balance the need for a validated workflow to be called within other packages and the desire for tedana to function as a testing ground for additional/new methods. In order to make that happen, I think we need to:

limit the options available in tedana_workflow
- This is sort of what @emdupre, @dowdlelt, and @handwerkerd have been discussing in #101, although it can take the form of defaults (which means keeping a large number of arguments in the CLI and workflow) or hardcoding our recommendations and removing the arguments (which reduces the flexibility of the pipeline, but also "canonizes" the standard pipeline for the majority of users).
- I know the latter option might sound like a bad idea (and maybe it is), but in terms of fMRIPrep and afni_proc integration, I don't think it makes sense for these other pipelines to include a large number of new arguments just for a relatively small subset of use-cases. Alternatively, we could probably treat validated pipelines (i.e., collections of argument values) as the units with which to call tedana from these other packages, rather than trying to set individual arguments to tedana_workflow.
move most of the content of tedana_workflow into functions in other modules
- This way, devs can construct and run their own workflows without having to copy and edit the full tedana_workflow file.
- I think this is something we've brought up, but I don't know if any recent work has been moving toward it.
make sure that tedana's structure can handle expansion
- We've been discussing the structure here, but having a better idea of what other kinds of decomposition and/or selection are possible would be really helpful for figuring out how to make tedana both flexible and interpretable.
- I think that finding that balance is one reason to (at least for now) include what could be suboptimal methods (e.g., including both ICA component selection methods kundu_v2_5 and kundu_v3_2 or PCA component selection methods mlepca/varex and decision_tree).

Regarding validation of the methods in the testing ground, that is something we are/will be working on in tedana-comparison. I don't really know if it's feasible to include validation of the methods in tedana proper, given how much validation will take (e.g., running the new method on several public datasets, generating many detailed reports, and critically evaluating all of it against existing methods). Not to mention having to look into how a new method will interact with other components of the overall pipeline.

Thanks so much for your feedback, @tsalo ! I think once we're ok with the Project vision and Metrics of success, I'll add them to our 0.1 release project. We'll need to think about a timeline, next ! To address each of your points in turn:

Your question about whether or not we should be removing arguments for tedana_workflow is a good one. It's also a point where I'd love to hear @handwerkerd's thoughts, since I think I'm a little biased these days towards easy integration with the larger pipelines, but Dan would be able to provide a really good perspective on exactly how much plug-and-play flexibility tedana methods developers need (as a core developer !).
I also remember that moving most of the content of tedana_workflow functions into other modules was a relatively popular idea -- does anyone have recollections otherwise ? We could open another issue for the how part of that discussion. Edit: @rmarkello started this discussion in #50, though we were focused on slightly different issues at the time and didn't address this larger context. I'm not sure if we want to revive that discussion or start a new one, but at least wanted to make the link explicit !
Regarding the possibility of expansion, I think it's hard to future-proof, here. Since we know that decomposition and selection are likely to change in the near future, I'm happy to focus the discussion on those. Hopefully the lessons we learn will be applicable in the future as other needs arise !
I do agree that it's probably not feasible to do extensive testing in tedana itself. I think one question I've been wrestling with is "what is the minimum viable amount of testing we need?" I think we agreed to both a three and five echo dataset, but if we're retaining several "sub-optimal" options, should we still be testing those as well ? If we're not testing them, how can a future contributor come along and make improvements to them, without having a baseline ? I really don't know the right answer here, but I'd love feedback.

Overall, then, it seems like you're ok with the suggested Metrics of Success, but we should continue to discuss point 1 (hopefully hearing a few other perspectives !) before moving forward with the Project Vision. Does that match your understanding ? Thanks again for sharing your thoughts :sparkles:

Thank you everyone for a really great call on Monday 29 Oct! @emdupre is going to summarise some of our points and add a roadmap file to the read the docs file as a release candidate for community discussion. Stay tuned, but we're very close to closing this issue and moving forwards ✨ 🙀 🙌 👾

ME-ICA / tedana