Closed rmarkello closed 5 years ago
I think that a lot of the proposed changes move tedana
away from being able to incorporate new methods. For example, having selection
as a module makes it easier to incorporate different approaches. At minimum, we want to support both the v2.5 and v3.2 of the Kundu approach, but I think that, if we want to support other kinds of decompositions, it only makes sense that there would also be other ways of calculating component metrics and selecting components.
Some other random thoughts:
decay
seems like a much better name than monoexponential
. Very much for this.io
and utils
really do need to be cleaned up, and I guess there's no real reason to keep them both in utils/
.decomposition
: I've actually been thinking that we should reorganize this module. tedpca
includes the PCA decomposition, metric calculation (fitmodels_direct
), and the component selection, while tedica
just includes the ICA decomposition. The other steps for MEICA (metric calculation and component selection) are called in the workflow and in selcomps
. It might be good to make decomposition
just the PCA and ICA decompositions, while the metric calculation and component selection portions for PCA could be treated the same way they are for ICA. Something like this, maybe:tedana/
├── decay.py
├── combine.py
├── decompose/
├── pca.py
└── run_pca()
└── ica.py
└── run_ica()
├── fit.py # calculation of betas from fitmodels_direct
├── metrics.py # the metric computation from fitmodels_direct
├── select/
├── tedpca.py
├── decision_tree() # but with a better name
└── varthresh() # but with a better name
└── tedica.py
├── kundu_v2()
└── kundu_v3()
├── postproc/ # but with a better name
├── godec.py
└── t1gsr.py
├── workflows/
├── tedana.py
└── t2smap.py
├── io.py
└── utils.py
I'm not sure how many selection
methods we should incorporate at this point, given that almost all of them except for kundu_v2()
are basically untested...
Ignoring that for the moment, I'm not sure I agree with separating e.g., decompose/pca
from select/tedpca
. The former is essentially 5-10 lines of code, and the latter is wholly dependent on it. Moreover, I can't think of an instance where I'd run, say, decompose/pca
and then feed it into anything but select/tedpca
, so I'm not sure the modularity there is necessary. Could the select
module be subsumed by decompose
, so that there's less distances between functions that are dependent on one another? decompose
would then be the one module that's designed to "grow" as new techniques are added.
I agree that editing workflows and parsers in the same file is convenient, so I'm alright with that. One thing I do think it might be nice to work out is how to make the workflows a bit more modular—as it stands, if I do want to substitute out a new decomposition technique there's no easy way to do that without copying the workflow code and changing that single line. It's obviously not nearly as monolithic as it used to be, but it's still quite a bit. I don't have an easy solution for this, but I think it's something to think about as we consider these changes.
tedana/
├── combine.py
├── decay.py
├── decompose/
├── pca.py
├── run_pca()
├── var_thresh()
└── decision_tree()
└── ica.py
├── run_ica()
├── kundu_v2()
└── kundu_v3()
├── fit.py
├── metrics.py
├── postproc/
├── godec.py
└── t1gsr.py
├── workflows/
├── tedana.py
└── t2smap.py
├── io.py
└── utils.py
Heya - just a little note from me to say that I think the modularity is a really key feature for tedana
:tada:.
If I can help with drawing out some additional documentation, some diagrams to show how someone would be able to slot various parts of tedana into their analysis pipelines I'm very happy to do what I can (I'd need a buddy but I can at least provide an outsider's view on the descriptions).
I also think those diagrams could show people how they can insert additional selection methods. The core developers don't have to write all the modules - you can welcome external collaborators to extend the package once the documentation and layout is set up to be really easy to extend.
In general I really like the idea of consolidating the updates to the packages and getting them super solid. I think this is one of those "feels like slowing down but is actually faster in the long run" development moments 😸
I definitely agree that the modularity of tedana
is key! I think it's important to strike a balance between modularity and interpretability, though, and in that case tedana
is a tough nut. Many of the functions in the package are highly interdependent on one another, so splitting them up too much means having to jump from file to file when looking for the next step in the workflow. On the other hand, if they're all put together then they become uninterpretable.
I absolutely think that the end goal should be to allow extension and growth, so it's important to find the right balance between building a solid, understandable core and ensuring extensibility. @KirstieJane, I think having some sort of (pictorial) documentation about how the project is laid out, rather than hoping that it speaks for itself, would be really beneficial! I'm not totally sure what it would look like but would be happy to discuss it further.
@tsalo, do you have thoughts on the most recent proposal? Are you pretty set on retaining separate modules for decompose
and select
? Let us know!
I'm also going to explicitly ping @emdupre and @handwerkerd because I think it'd be good to have as many people chime in on this as possible!
I think merging select
into decompose
makes a fair amount of sense given how small the PCA/ICA functions are, but it means having the workflow go decay
--> combine
--> decompose.pca.run_pca
--> fit
--> metrics
--> decompose.pca.var_thresh
--> decompose.ica.run_ica
--> fit
--> metrics
--> decompose.ica.kundu_v2
--> postproc.t1gsr
. The thing that's a little weird in that workflow is going into decompose
, then jumping out to fit
and metrics
, and then going back into decompose
, for both PCA and ICA.
Maybe it would make more sense to move fitmodels_direct
into decompose
and to keep select
separate? Then, fitmodels_direct
could be a private function that tedpca
and tedica
call, instead of calling it in both the workflow (for ICA) and tedpca
(for PCA). The difficulty is in properly separating the general model from what could be method-specific metrics. It's hard to know what other metrics could be derived from the data, since that's not something I know much about.
Hi everyone,
Sorry for my own radio silence here ! Thanks so much for bring up this point @rmarkello and for kickstarting what is an admittedly overdue discussion. I have a few small points I'd like to quickly make, but I think the much bigger point here is that we need to re-focus our roadmap.
Re @KirstieJane's suggestion of a diagram: I think this would be phenomenally useful, and something @tsalo actually started on -- you can see his draft here. I know that he was looking to add this into our RTD site in #133, but it'd be great to think about how we can distribute this in a way where it's as easy as possible to edit and extend !
Re organizing of modules: it sounds like there's at least consensus on utils.py
, io.py
and decay.py
. Those should be easy enough to implement, and I'll open an issue specifically for re-organizing those modules so that we can continue the broader discussion without forgetting to make those changes ! If that's not your understanding @everyone, please let me know :smile:
Ok, on to the broader point.
In laying out a roadmap for tedana
, I'd like to structure the discussion around two key areas where I think we need to reach some consensus; namely, our project vision and our metrics of success. That is, "what is the problem that we believe tedana
is positioned to uniquely solve?" and "how will we know that we have succeeded in solving that problem?". I'm including my own understanding of these here, and I'd love to hear your feedback.
ME-EPI is not well integrated into major preprocessing packages, yielding duplicated and unmaintained code. tedana
will serve as a central repository for standard ME-EPI denoising as well as a testing ground for novel ME-EPI denoising methods. This will jointly reduce the external burden on pipeline maintainers, facilitate increased ME-EPI adoption, and enable future development in ME-EPI denoising.
We will know that we have been successful in creating tedana
when we have succeeded in providing the following concrete deliverables:
Currently, afni_proc.py
distributes an older version of tedana
, which they have built a helper, tedana_wrapper.py
script around to ensure compatibility. Users of afni_proc.py
at this point are therefore (by default) not accessing the latest version of tedana
. One metric of success, therefore, will be if we can demonstrate sufficient stability and support for tedana
such that the afni_proc.py
maintainers are willing to switch to pip install tedana
as the recommended method of accessing ME-EPI denoising in AFNI.
Currently, users of fMRIPrep
for ME-EPI pre-processing are encouraged to first run fMRIPrep
and then tedana
. This has proven frustrating for users interested in only obtaining the optimal combination— a relatively common use case. Perhaps even more problematically, when using ME-EPI denoising it has proven to fail for a subset of subjects, with only one or two components are returned as accepted in some cases. Another metric of success, then, will be fully integrating tedana
into fMRIPrep
via the merge of this pull request.
One long-standing concern with ME-EPI denoising has been the availability of documentation for the method outside of published scientific papers. To address this, we have created a ReadTheDocs site; however, there are still several sections either explicitly marked as "# TODO" or otherwise missing crucial information. A metric of success for tedana
then, should be a completed RTD site, that provides the minimum necessary information to orient a complete newcomer to ME-EPI on both the theoretical basis of the method, as well as the practical steps used in denoising.
There is also, of course, the need to improve on the denoising method itself. We have discussed this in issues including #97 and #101, and I think this is where most of the concern around the module organization comes from. In my mind, a metric of success here would be EITHER integrating a new decomposition method, beyond ICA OR validating the new selection criteria.
Overall, then, I see that the majority of work on the project should go towards making it stable and well-documented, with enough modularity to integrate improvements (plus one improvement, as a proof of concept). My dream would be that we would have a fifth metric of success:
In thinking through these ideas, I've gone ahead in our 0.1 release project and re-organized the issues to roughly match the suggestions listed above, but I'm very anxious to hear what you all think. I'm also happy to schedule a call where we can discuss this, but I thought it'd be nice to start with my own thoughts, here.
Thanks so much to everyone for all of your time and energy in thinking through this— I'm really excited about our next steps.
I love this road map. It makes total sense to me. Are you planning to raise an issue specifically for it or add it to the RTD site? I think it could be helpful to have in the contributing guidelines.
Regarding the Project Vision, it seems like we need to better balance the need for a validated workflow to be called within other packages and the desire for tedana
to function as a testing ground for additional/new methods. In order to make that happen, I think we need to:
tedana_workflow
tedana
from these other packages, rather than trying to set individual arguments to tedana_workflow
.tedana_workflow
into functions in other modules
tedana_workflow
file.tedana
's structure can handle expansion
tedana
both flexible and interpretable.kundu_v2_5
and kundu_v3_2
or PCA component selection methods mlepca
/varex
and decision_tree
).Regarding validation of the methods in the testing ground, that is something we are/will be working on in tedana-comparison. I don't really know if it's feasible to include validation of the methods in tedana
proper, given how much validation will take (e.g., running the new method on several public datasets, generating many detailed reports, and critically evaluating all of it against existing methods). Not to mention having to look into how a new method will interact with other components of the overall pipeline.
Thanks so much for your feedback, @tsalo ! I think once we're ok with the Project vision and Metrics of success, I'll add them to our 0.1 release project. We'll need to think about a timeline, next ! To address each of your points in turn:
Your question about whether or not we should be removing arguments for tedana_workflow
is a good one. It's also a point where I'd love to hear @handwerkerd's thoughts, since I think I'm a little biased these days towards easy integration with the larger pipelines, but Dan would be able to provide a really good perspective on exactly how much plug-and-play flexibility tedana methods developers need (as a core developer !).
I also remember that moving most of the content of tedana_workflow
functions into other modules was a relatively popular idea -- does anyone have recollections otherwise ? We could open another issue for the how part of that discussion.
Edit: @rmarkello started this discussion in #50, though we were focused on slightly different issues at the time and didn't address this larger context. I'm not sure if we want to revive that discussion or start a new one, but at least wanted to make the link explicit !
Regarding the possibility of expansion, I think it's hard to future-proof, here. Since we know that decomposition and selection are likely to change in the near future, I'm happy to focus the discussion on those. Hopefully the lessons we learn will be applicable in the future as other needs arise !
I do agree that it's probably not feasible to do extensive testing in tedana itself. I think one question I've been wrestling with is "what is the minimum viable amount of testing we need?" I think we agreed to both a three and five echo dataset, but if we're retaining several "sub-optimal" options, should we still be testing those as well ? If we're not testing them, how can a future contributor come along and make improvements to them, without having a baseline ? I really don't know the right answer here, but I'd love feedback.
Overall, then, it seems like you're ok with the suggested Metrics of Success, but we should continue to discuss point 1 (hopefully hearing a few other perspectives !) before moving forward with the Project Vision. Does that match your understanding ? Thanks again for sharing your thoughts :sparkles:
Thank you everyone for a really great call on Monday 29 Oct! @emdupre is going to summarise some of our points and add a roadmap file to the read the docs file as a release candidate for community discussion. Stay tuned, but we're very close to closing this issue and moving forwards ✨ 🙀 🙌 👾
Hi all! Sorry I've been a bit radio silent, but I have been watching the development of
tedana
over the past few months and it's been great! There's a whole bunch of new features and stuff that look really cool and I'm excited to try them out. I'm wondering, though, if we might slow down new development a bit and focus some on refactoring the package?That is, coming back to
tedana
after some time away, it took me a not-insignificant amount of time to remember what functions were where and how they all interconnected. It's still sprawling, and I think that, given the trajectory of the package (i.e., it's not going to be anything near the size of e.g.,nilearn
ornistats
), it might be good to consolidate things a bit.To that end, would something like the following be more parsimonious / understandable to new contributors, without sacrificing modularity?
In my mind, these would have the following:
combine.py
Functions for creating the optimal combination time series.
Ideally the functions here would also accept file inputs, in addition to concatenated data inputs. Pre-generation of a T2* map (via
decay.py
functionality) would be optional, so that users interested in the optimal combination could call this function providing only their echo files and echo times and get back an in-memory niimg-like object of the optimal combination time series.decay.py
Functions for modeling signal decay and generating T2* maps.
This is more-or-less a carbon copy of the existing
model/monoexponential.py
right now, but renamed. New functions for modeling signal decay (i.e., voxel-wise fits) could be added here.decompositions/
Functions for decomposing time series.
Given that this is the primary module that might be expanded to accommodate other decomposition techniques, I would be comfortable keeping this as-is.
fit.py
Functions for fitting R2* and S0 models.
Copy of
model/fit.py
, but in the general package namespace.io.py
Functions for handling reading and writing data, including consolidating components into relevant time series.
This is currently a bit of a mess right now. The I/O files for
tedana
are scattered throughout the module, so it would be good to consolidate and clean them up.selcomps.py
The almost 900-line function we all know and love (🙄), plus corresponding utilities.
I know there's a significant amount of work going in to making this more modular and broken up, so I think that will help a lot with its interpretation. I'm not sure I see the utility, though, in it being within a module (i.e.,
tedana.selection
, as it is now).utils.py
Miscellaneous utility functions supporting the general package (e.g.,
new_nii_like()
,filewrite()
,'load_data()
).cli/
Command line interfaces for
tedana
.It might be good to pare these down a bit so that they're really only handling the CLI logic (e.g., parsing, etc.), and not the entire workflow as they currently are. The workflows could potentially go in a separate file (
workflows.py
), but I think that's a point for further discussion.Sorry for the incredibly long issue, and for just jumping in with pretty major suggestions. Let me know your thoughts.