MRtrix3 / mrtrix3

MRtrix3 provides a set of tools to perform various advanced diffusion MRI analyses, including constrained spherical deconvolution (CSD), probabilistic tractography, track-density imaging, and apparent fibre density
http://www.mrtrix.org
Mozilla Public License 2.0
295 stars 182 forks source link

(ms)dwi2response & sh2response: Resolving naming before release #422

Closed Lestropie closed 8 years ago

Lestropie commented 8 years ago

We need to decide what to do about these before we release. Supersedes #105.

Here's my current thinking, feel free to modify or suggest complete alternatives:

Hopefully that makes it clear that there are in fact two steps: single-fibre voxel selection, and response function estimation based on those voxels. Just using the (optional) response function output from dwi2sf is a shortcut that will probably satisfy most users, but hopefully the fact that the SF voxel mask is the default output will make people question that and look into alternative ways to get the SF mask rather than just reporting that they can't get a response function.

Hopefully it also separates msdwi2response from what is currently dwi2response, as their operations are actually quite different currently.

dchristiaens commented 8 years ago

I agree with your argument for dwi2sf (I might call it dwi2sfmask, to avoid confusion with a sinlge fibre kernel) as a more appropriate name for implementing Tax et al. (2014). However, I don't like the sf2response rename. That command will always require both a mask and the input data, and it just makes more sense to me to call it after the type of input data, rather than the type of mask. Besides, sh2response is used for fitting GM and CSF response functions just as well.

As to whether it should receive DWI input rather than SH, I would still prefer SH. As long as dwi2fod and msdwi2fod live happily next to each other, we need to distinguish between the singe-shell and multi-shell response functions at some point. Otherwise how do you handle the b=0 in good old CSD? In my opinion, sh2response has the benefit of being 100% clear on being single-shell only.

Nevertheless, I fully agree that dwi2response and msdwi2response are currently very different operations, and that this may cause some confusion. What I would propose we do instead is to introduce dwi2response as a script, identical to msdwi2response but operating on the shell of highest b-value only. That way, we would effectively have the two-step process you root for: single-fibre masking and response function estimation.

thijsdhollander commented 8 years ago

Not going too much into naming details --I'm happy with whatever names you guys come up with on that front-- I am a big fan of having most "...2response" kind of functionality as scripts. It's no secret that we have a lot of happy competitors in that subdomain now; which also shows that not a lot of people will soon agree on what is "the best" solution. Having such potentially contested things as scripts, makes them very open, and appears more inviting to people to dig in themselves and make the most appropriate choices for their studies. Just as a random example: I've seen people do Tax's approach, but initialising the algorithm with other stuff (than a low frequency "fat" response").

I'm in general also a big fan on keeping logical steps and implementations of different sources (and their outputs) separate; not only does it give people more insight into how things work (and inherently, potentially more clues as to what happened when things go wrong; much like @Lestropie 's example above), it also allows to better give credit to the legit sources/references of the implemented methods on offer.

jdtournier commented 8 years ago

OK, there was already a lot of discussion around this in #189. Basically, I'd prefer to have the whole process scripted for very much the same reasons that @thijsdhollander pointed out: low barrier to entry, making it easy for users to debug when things go wrong, fix up the process if their data turns out to need special handling, and generally try different things out easily. Also, I don't see that there'd be much impact on performance (if at all), since MRtrix is optimised for fast access to data (memory-mapping). Where things might go slow is when strides or datatypes don't match, forcing some commands to preload (I've noticed that on msdwi2response, but it's easily fixed with a quick mrconvert to avoid the multiple preloads later).

On the exact topic under discussion here: I personally don't see a conflict - dwi2response and sh2response are clearly expecting different inputs. And I don't see a reason to change this, certainly not because of naming issues. However, from my point of view there's actually at least 4 distinct steps in the response function estimation problem, each of which may be done in different ways with different options:

The point here is that each of these steps is sufficiently complex in its own right to warrant a dedicated app. Besides, the apps to do each of these steps either exist already, or would be useful anyway. We also want to have the freedom to adjust any one of these steps if they happen not to work all that well on a given data set.

As I mentioned in #189, the approach I used in this paper for (standard) response function estimation differs a bit from the Tax et al. approach, starting from a sharp response for the initial CSD, which gives a much more stable, smoother FOD (since deconvolving a sharp response attenuates high frequencies in the FOD, due to the inversion). Also, I'd still opt for using the top 300 voxels (or whatever number is deemed acceptable), since that would allow the process to adjust to non-standard data (e.g. ex-vivo, muscle, etc). I'll reiterate the process here (I'd already posted it in #189):

# start with sharpest possible response function (i.e. SH decomposition of a flat disc).
$ echo "1,-1,1,-1,1" > init_response.txt

# initial estimate of FOD, within brain mask:
$ dwi2fod dwi.mif -mask mask.mif init_response.txt fod.mif

# extract 2 largest peaks into a 4D image split over volumes:
$ sh2peaks fod.mif -mask mask.mif -num 2 peak-[].mif

# compute ratio of peak squared amplitudes (admittedly not pretty, but clear enough):
# (note this computes the ratio peak1/(peak2+1) to prevent divide-by-zero if peak2 is zero)
$ mrcalc peak-0.mif 2 -pow peak-1.mif 2 -pow peak-2.mif 2 -pow -add -add peak-3.mif 2 -pow peak-4.mif 2 -pow peak-5.mif 2 -pow -add -add 1 -add -div peak_ratio.mif

# get mask of 300 voxels with largest peak1/peak2 ratio:
$ mrthreshold peak_ratio.mif -top 300 single-fibre.mif

# update estimate of response function within that mask, using FOD primary peak orientation:
$ amp2sh dwi.mif - | sh2response - single-fibre.mif peak-[0:2].mif response.txt

When I used this, a single iteration was enough for convergence. I just tried it on other data, it needed a few more iterations, but basically 2 iterations was enough to get within 3 sig. figs: response convergence In terms of the response itself, starting with the initial sharpest possible one on the left, followed by the first 3 iterations: responses_iter The main bottleneck was actually the sh2peaks call. Even then, each iteration through this takes <20s on my system, so the total runtime for two iterations using this approach would be under 40s... In practice I'd probably speed this up using another mrthreshold call to refine the mask to something guaranteed to include all likely candidates, but much smaller than the full mask. For example, using a mrthreshold peak_ratio.mif -top 10000 mask_refined.mif to restrict further processing to the top 10000 voxels of the previous step reduces runtime for the dwi2fod and sh2peaks calls to sub-second levels, so the whole thing takes under 25s on my system.

I'll try to put all this into a slightly tidier script, see what you guys make of it - might very well be that it turns out to be unstable on different datasets - who knows...

jdtournier commented 8 years ago

OK, committed to estimate_response - see what you think...

dchristiaens commented 8 years ago

@jdtournier Your script works fine for me. Interesting approach too, and I notice that the resulting kernel is very sharp. You make a good point in distinguishing 4 steps too, although when you start iterating between them like Tax's method, things tend to get a bit confounded.

Now that we have established that we're all big fans of scripts, how do we move forward in "resolving naming before release"? Donald, do you suggest we separate those 4 steps in separate calls and can you come up with sensible names for them? Or are you fine with having (multiple) scripts integrating (part of) these steps in a pipeline? Finally, which of these are critical to hit release?

The one thing that bothers me most in the current setup, which @Lestropie was also addressing, is the different methodology in dwi2response and msdwi2response. Hence my suggestion providing a dwi2response script, identical to its multi-shell counterpart but adapted for single-shell (see above). These could even be integrated in one script as long as we have some way of specifying how to deal with the b=0. Of course, that leaves the question of how the current dwi2response command should be renamed.

thijsdhollander commented 8 years ago

Can't test it here, but it sounds pretty solid/robust overall to me! And given that it's indeed a nice and open script, anything from numbers (e.g., that 300, etc...) to complete steps can be easily adjusted. :+1:

I actually became quite alarmed recently when I started noticing that the current dwi2response, when given a full brain mask as initialisation (to the -mask option of that command), quite often converged to a wrong optimum. This only came to my attention when consciously requesting the single fibre mask output, and noticing that the typical regions I'd expect (e.g., middle of the corpus callosum, etc...) were actually typical regions excluded from the final mask, while several typical crossing regions and grey/white matter partial volumed regions were the final output. The/an overly fat initial response can, so it seems, in a magical/diabolical turn of events actually set off the optimisation process in a wrong direction. :scream: The issue wasn't there when initialising with a more white-matter'ish mask (e.g., resulting from a simple FA threshold or similar; basically ensuring the initial response is not that fat). But this experience goes to show the dangers of an almost "black box" command that lures users with a name (dwi2response) that promises convenience (you only need to provide the dwi data and only a (brain) mask that dwi2mask can simply generate) at the expense of critical thinking or checking the final single fibre region... Not sure what level of insight we can expect from an average user, but I had the feeling such an average user would probably have overlooked this kind of outcome to begin with, and even if not, would not have known that providing anything else than a brain mask to the -mask option was an (artificial, one could argue) solution in this case.

Lestropie commented 8 years ago

Regarding sh2response / sf2response: One method I'm hoping to attempt in the near future for response function estimation (given a single-fibre mask & directions) will use the raw DWIs. That would therefore not be able to live within sh2response. Whereas, if sh2response were modified to sf2response, with the only modification being that the amp->SH transform occurs as part of the binary (potentially with the inclusion of fancy options such as noise map input / outlier rejection) rather than by using amp2sh explicitly beforehand, that would not only simplify the multi-shell script (as the sf2response command could output one or more b-values per call), but would allow future developments in response function estimation to reside within it. So that option to me is still more attractive overall. Or you could just rename it to sf2response but have it expect either DWI or SH as input depending on what estimation algorithm you're using, kind of like tckgen.

Besides, sh2response is used for fitting GM and CSF response functions just as well.

stumbles

What about mask2response?

Just as a random example: I've seen people do Tax's approach, but initialising the algorithm with other stuff (than a low frequency "fat" response"). ... I actually became quite alarmed recently when I started noticing that the current dwi2response, when given a full brain mask as initialisation (to the -mask option of that command), quite often converged to a wrong optimum.

One thing I do differently to Chantal in this respect is that I derive an initial RF by setting the l=0 and l=2 terms based on the mean and standard deviation of the data, rather than magic numbers. I think that's why it can fail if you don't provide an initial brain mask. If that binary were to stay, I'd make that mask a compulsory input.

Also since it's kind of on the topic: I've found with HCP data that initially masking with FA>0.5 then feeding to dwi2response works well.

starting from a sharp response for the initial CSD, which gives a much more stable, smoother FOD (since deconvolving a sharp response attenuates high frequencies in the FOD, due to the inversion)

We've had this discussion before... Check your data :-P

Also, I'd still opt for using the top 300 voxels (or whatever number is deemed acceptable), since that would allow the process to adjust to non-standard data (e.g. ex-vivo, muscle, etc).

I've had data where using 300 voxels has been very much not appropriate, which is why I've always tried for a more data-driven approach; but I'm pretty sure I've lost that argument already.

The main bottleneck was actually the sh2peaks call.

If scripting, I'd go fod2fixel | fixel2voxel dec_unit | mrconvert -coord 3 0:2 to get your XYZ directions; it's definitely faster than sh2peaks.

OK, committed to estimate_response - see what you think...

Given dwi2response was sold as a black-box solution, then users kept reporting datasets where it didn't work, I don't think this should be pushed as a replacement with testing only on stock-standard b=3000 data. If it stumbles just as often on weird data as the current solution, most users will still just complain about it, independently of being a script rather than a binary.

Also: nice to see my script interface being used... :-(


OK, so my final summarized suggestion:

jdtournier commented 8 years ago

@Glad there's some form of consensus emerging! Quick comments on specific points:

Replacing amp2sh | sh2response with a direct sf2response

Fine by me! the whole point is to allow these kinds of modifications to be made trivially, and each stage to be tested and made available independently. I was also thinking that the amp2sh call might be unstable on ill-conditioned data, and that we could do a lot better by doing the DWI->response fit in one go. So absolutely, if you want to have a go, I'd certainly be happy to have these calls replaced with something better. Or to have whatever replaces it also allow operation on the SH coefficients directly.

I'd also be happy to replace the sh2peaks call with something fixel-based (as you suggest in #426) if we find that it consistently performs better (and faster).

Renaming sh2response to mask2response

I don't like the name much (for the reasons @dchristiaens already mentioned above), but if it really gives you the irrits... I'm not sure why you'd need to change it though, it's not conflicting with any other command you might be planning to replace it with, is it...? Besides, it really has to be SH, even for lmax=0... Otherwise the scaling will be off. But as you suggest, that wouldn't matter much if this was replaced with a sf2response command that allowed operation on either raw DWI or SH coefficients. Or maybe responsegen...?

Starting with a sharp response

Check your data.

Just did. You're right, it's not all that clean with a sharp response if you go for the full lmax=8 fit - see images below. Much better if restricted to lmax=4 (next image). Nonetheless, using a fat lmax=2 response seems a very bad idea to me (last image) - it can't actually detect crossing fibres at all (you'd need at least lmax=4 for that)...

screenshot0000 screenshot0002 screenshot0001

Supplying another black box

I thought the whole discussion around scripting was to make sure this wasn't a black box?!? I admit I might have given the wrong impression by creating a pull request for it - I have no intention for this to be merged at this stage, it's far too rough round the edges. I just thought I'd push to a separate branch so people can test it on their own data before we take this further. Of course we'll need to test it and modify extensively to make it clean and provide access to all the options, etc. before including it in the main release...

And as I've mentioned before, I've tested it on other data sets too, where things are much more challenging (neonatal b=500, 47 DW directions, very little angular contrast). But I'd like people to try it on ex vivo or muscle data too, which would be equally challenging. There's a few issues there, I'm refining the approach, and I'll commit an update on the estimate_response branch in a bit.

Also, on the data I was initially using for testing (the old b=3000, 150 DW directions BRI data set), dwi2response failed after a 10 minute runtime with the 'no voxels left in mask' error. I think there's a good argument to be made for ensuring that whatever strategy we do use works out of the box on any dataset acquired with the settings we ourselves recommend... So if we're going to put disclaimers and stringent tests on any alternative approaches, they should equally apply to the existing approach. Regardless, I completely agree that whatever replacement we come up with will need to be extensively tested and polished before we push it out.

Thresholding on number of voxels

You mention you had data where this wasn't appropriate - what was the issue there? I can't think of a situation where it wouldn't be the right thing to do... We want the sharpest response supported by the data - not just the average over those voxels that appear single-fibre. I plucked the 300 number out of the air, but the rationale was to avoid being overly influenced by voxels that were artificially sharper due to noise (hence a largish number), but still reflected the sharpest subset of the data. I also note that dwi2response often ends up with way more voxels included, but I'm not convinced that's necessarily a good thing in terms of ensuring that the voxels included genuinely have the least dispersion, and hence the best estimate of the intrinsic response.

In any event, that is again something that can trivially be changed if needed - it's just one line in the script. I've had one data set that produced a funny-looking response, and inspecting the SF mask quickly showed the problem: there was a prominent fat artefact, and many of the voxels ended up in there. Easy enough fix (erode the mask...), but I'm not sure there's a great deal we can do about this when the data are artefacted in such a way as to exhibit high anisotropy within the initial brain mask.

Not using ArgParse

Dude, I had no time to figure it out! I just wanted to cobble a quick proof of concept together quickly, expecting a proper version to be scripted in Python later. But even then, I don't think it's a problem to use bash scripts if that's what people are comfortable with - it may not be as powerful as Python, but you can go a surprisingly long way with it... What's missing is the whole argument parsing interface, which is why it will need to be ported to Python eventually: too many options for each stage...

Anyway, more on #426...

dchristiaens commented 8 years ago

My thoughts on Rob's suggestions:

nadluru commented 8 years ago

Dear MRTrix3 development team members,

Are the msdwi2response, msdwi2fod commands reasonably ready for use? If so can I just checkout the estimate_response branch instead of the master for those two commands?

Thanks so much! Sincerely, Nagesh

bjeurissen commented 8 years ago

They should be stable. I would use the updated_syntax branch if you want to try the multi-shell stuff.

jdtournier commented 8 years ago

Well, not so sure about the estimate_response branch - we're still debating that one (see pull request #426). I think it's good enough to use as-is, so feel free to use it. It's already based on the updated_syntax branch, so that should be fine to use in combination with msdwi2response. Bear in mind that both of these are very likely to change over the coming few months/years as we gain more experience with them, but if you're happy with that, then by all means give them a go. And if you have any feedback, positive or negative, we'd be more than happy to hear it - best to sort out any issues now rather when they make it to master...

nadluru commented 8 years ago

Got it. Thanks so much Ben and Donald! I will try the updated_syntax branch and play with the ms* commands and share my experience.

Sincerely, Nagesh

nadluru commented 8 years ago

@jdtournier I compiled updated_syntax branch as well as estimate_response branch but was not able to find the msdwi2response command. They both seem to compile only msdwi2fod. Am I missing anything. Thanks so much for your help!

thijsdhollander commented 8 years ago

Hi @nadluru ! The msdwi2response is actually a script (not a compiled command); so you'll find it in the scripts directory.

nadluru commented 8 years ago

Aha!! Thanks so much Thijs!!

Lestropie commented 8 years ago

OK. Pushed some bytes. All in updated_syntax.

dwi2response is now a script; the binary and headers have disappeared. It encapsulates a number of possible mechanisms for response function estimation, gives some command-line options for each, and is hopefully accessible enough for people to have a go at their own approaches if they so choose (though I'd like it if it could auto-detect new algorithms and handle them appropriately).

Still not 100% happy with the command-line parsing: any 'standard options' have to be provided before the name of the desired algorithm. If anyone thinks they can get argparse to behave, you're more than welcome to try.

jdtournier commented 8 years ago

@Lestropie - looks awesome. That will definitely make it clear that response function estimation is not a simple problem... :wink:

I don't think I'll manage to look through this in any detail any time soon. It looks pretty much spot on from a quick look through, so I think we just keep it in there, and kill pull request #426 as you suggest. We'll soon find out whether there are any issues with it...

Just wondering what the default algorithm is, by the way...? Or do users have to chose one approach explicitly?

Lestropie commented 8 years ago

Just wondering what the default algorithm is, by the way...? Or do users have to chose one approach explicitly?

Forced explicit. There's two layers in the help page too: once you specify an algorithm, the help printout includes the arguments/options for that algorithm.

jdtournier commented 8 years ago

OK, so the obvious question users are going to ask is: which one should I use? What's our recommendation...?

Lestropie commented 8 years ago

:see_no_evil: :speak_no_evil: :hear_no_evil:

jdtournier commented 8 years ago

Yeah, I figured...

I guess at the very least some hefty wiki documentation will be needed to cover this. I vote we recommend my approach (obviously) on the wiki for single-shell unless anyone has any objections...? And obviously, recommend the msmt_5tt approach for MSMT-CSD. We could also make these recommendations in the script's help page (there is one, right?), but I agree, I don't think we should have a default as such, seen as it can be used for various purposes.

Thoughts?

dchristiaens commented 8 years ago

I don't think we need a default either, although it would make things easier for new users... If we would set a default in the script, I'd say the CSD 2007 and MT-CSD 2014 setups are the most neutral and straightforward. As to our recommendation on the wiki, it can't hurt to pick a newer approach. Fine with Donald's suggestion.

thijsdhollander commented 8 years ago

Definitely in favour of no defaults on this one. I'd personally even be careful/conservative with recommendations to be honest... if anything we just need to clearly define what the offered strategies do (and what the prerequisites for them are: e.g. the 5tt approach for MT responses clearly requires a very accurate T1 to dMRI registration; users need to know that this is crucial for, e.g., a reasonable GM response, etc..). So to me, it seems more about describing the pros and cons of each strategy. It'd be weird we straight out recommend one strategy, yet also provide other (seemingly suboptimal) ones... no?

If users do come and ask about it on the brand new community... all the better I'd say, because that way, much more opinions and experiences are encouraged to be contributed by a bigger audience.

jdtournier commented 8 years ago

I also think there should be no default, if only because we shouldn't be second-guessing our users as to whether they might want a single- or a multi-shell response, but I really don't think it would be wise to leave users without a clear recommendation. We are supposed to be the experts here, if we can't provide some clear guidance as to where to start, it's just going to cause frustration. I'm all for emphasizing that it's not as trivial as it sounds, and providing all the options we can should people want to try things out, but we have to give users a clear indication of what we think will work best in most cases. I really don't see your average neuropsych student or neurosurgeon wading through a detailed discussion on these issues so that they can make an informed decision. Even then, the amount of specialist knowledge required to make such an informed decision is enormous. For most users who aren't already experts in diffusion MRI, there has to be an obvious recommendation.

There's not much of a choice for the multi-shell stuff currently, so really it boils down to the single-shell case. There's three options here: fa, tax, and tournier. So:

So if anyone were to ask me which of these to try first, there's no doubt as to what I'd recommend. I appreciate everyone may have different opinions on the matter, but surely we can reach some consensus as to what we're going to state on the DWI tutorial on the wiki? This is after all going to be our implicit recommendation, and what most people end up using, no matter how much we explain the pros and cons of the various methods. Most users will take our first recommendation and run with that unless it clearly doesn't work for them. At that point they may bite the bullet and read through what we might have written on the topic. But I really feel we shouldn't be requiring our users to make their own decisions about something so technical. Sure, provide all the options for those who are interested, but most people just want to get their tracks out...

Lestropie commented 8 years ago

Agree that msmt_5tt and tournier will be the 'recommended' defaults. Though personally I'd shy away from having that stated in the script itself: instead direct to the wiki entry (which I need to write), and have a tl;dr sub-section at the bottom of the page. Users need to at least appreciate that it's not an unambiguous process.

I think we can all agree that the Tax approach has not worked out great for us, given the issues that have cropped up repeatedly on the forum, and all the discussion in this thread. But I do think it needs to be offered as an option, if only to allow users who might have analysed half their data with it to keep do so with the updated version of MRtrix.

I wouldn't recommend that people use the old dwi2response binary for half of their cohort, and this script algorithm for the second half. There will definitely be differences in the results; in particular my approach tried to reject CSF partial-volume voxels (where the AFD is smaller, but it's still 'single-fibre' and can pass the peak ratio test) and voxels with substantially larger AFD due to T2 effects.

This would be easily fixed using mrthreshold's -top option, to select the 300 most anisotropic voxels, rather than just those above a given threshold. Actually, thinking about it, technically that is what we should do, since that's actually what's stated in the 2007 paper...

Did the 0.2 documentation / 2004 paper use an FA threshold? Either way, shouldn't be too hard to make both options available; the 300 voxels can be the default if you want.

jdtournier commented 8 years ago

Agree that msmt_5tt and tournier will be the 'recommended' defaults.

:+1:

direct to the wiki entry (which I need to write), and have a tl;dr sub-section at the bottom of the page

Fine by me. My guess is most people will blindly follow the DWI tutorial the first time the give MRtrix a spin, so would have no trouble figuring out where to get the recommendations from.

I wouldn't recommend that people use the old dwi2response binary for half of their cohort, and this script algorithm for the second half.

Yes, I expected that might be the case... I guess we'd need to acknowledge that explicitly somewhere then. Mind you, there will be so many changes when updated_syntax gets merged, we'd probably need to recommend to people to not upgrade if they're mid-study. And maybe provide instructions for how to install both versions side by side, and how to switch between them...?

Did the 0.2 documentation / 2004 paper use an FA threshold?

Yes, it did. That's predominantly because I didn't have the -top option in mrthreshold at the time. But the instructions do mention that the recommended threshold of 0.7 is a guide only, and might need adjusting on a case-by-case basis. The paper however did explicitly state I used the top 300 voxels...

thijsdhollander commented 8 years ago

Putting it like

some clear guidance as to where to start

as @jdtournier said; I can definitely agree with. It's ok to point to a strategy that "generally" works, as long as it comes with a (secondary) list of identified limitations or specific scenarios where it will work less well / not. :+1:

Lestropie commented 8 years ago

Have made the fa algorithm change, and written a new response function estimation wiki page (in the updated_syntax branch of the wiki repo).

Lestropie commented 8 years ago

OK, hopefully this is all dealt with now.