AthenaEPI / dmipy

The open source toolbox for reproducible diffusion MRI-based microstructure estimation
MIT License
96 stars 30 forks source link

Small bug fix in signal_decay_metric #67

Closed weningerleon closed 5 years ago

weningerleon commented 5 years ago

The mask in the signal decay metric is calculated as all voxels with b0>0. However, it can happen that the diffusion weighted signal is zero while b0>0. In line 221 (ratio = np.log(mean_b0[mask, None] / mean_dwi_shells[mask])), this leads to a ratio of infinity and the program breaks a few lines further down. This behavior can for example be seen in subject 118730 of the HCP dataset.

A possible solution would be the proposed mask, which also takes non-zero b-value voxels into account.

rutgerfick commented 5 years ago

Hi @weningerleon , excellent suggestion, I totally didn't consider this when writing it.

The line you suggested completely works and should be merged, but to improve readability, can you please break it into two lines such that the maximum line length < 80 characters? (see Google's python style guide for PEP8 compliance).

weningerleon commented 5 years ago

Thanks for the fast feedback! I added a line break in order to comply with PEP8.

thijsdhollander commented 5 years ago

Hi @rutgerfick and other(s),

Just wanted to quickly stop by, say "hi" and thank you for maintaining these implementations so well! It's very nice to see some of my stuff being relatively feasible to reproduce independently.

I just spotted this, and remembered I at some point encountered the same when coming up with the algorithm initially. As the algorithm isn't after a full segmentation, but just after the "best" voxels per tissue class, it can easily be very picky; so any voxels that behave oddly in any possible way can simply be excluded from consideration. The ones with e.g. zero signal are good examples. After computing the signal decay metric, I just throw out anything that has somehow produced a non-finite value, for each non-zero shell's signal decay metric individually as well as for the overall/total signal decay metric.

I also cap the final signal decay metric at a value of 10 (not removing values higher than that, but setting them to 10). That's highly unlikely to happen in general (for the ratio to be exp(10), that is) anyway; but should it happen nonetheless, extremely high values / outliers might later on upset the automated thresholding in the algorithm. Note that when the signal decay metric reaches such high values, it's for relatively standard b-values already indicating a diffusivity that's way beyond free water anyway; so it's certainly CSF, but just with noise on the non-b-zero signal throwing the ratio way off. Even though the ratio is way off (but not infinite), still including it (and capping the signal decay metric at 10 for the thresholding to not stuff up) is totally fine though; together with a bunch of other CSF voxels, it'll still make for a very good CSF response down the track.

But in conclusion: throwing out non-finite values (as the solution here essentially does) is totally fine, and means that the rest of the algorithm won't run into unexpected nasty surprises.

Thanks again for all your efforts!

Cheers, Thijs

weningerleon commented 5 years ago

Hi @thijsdhollander,

Thank you for you reply!

As you have said, the solution basically throws out non-finite values. However, it does not implement the capping you suggested. I'm not sure this will really bring the desired effects. Really high values indicate a diffusivity way beyond free water diffusivity. Finally, at the end of the algorithm, the voxels with the highest SDM are selected as CSF voxels (This is basically what you have also written). Thus, including these will lead to an overestimation of CSF-diffusivity. In the subsequent CSF/GM/WM segmentation, an overestimation of CSF-diffusivity should then result in an overestimation of the CSF-compartment. What do you think of removing them instead of capping them?

rutgerfick commented 5 years ago

Hi @thijsdhollander ,

I'm happy to see you stopping by! For the independent reproducibility of your work you have entirely yourself to thank: you wrote your ISMRM abstract so clearly I could basically use it directly like a cooking recipe ;-)

I actually wanted to thank you for that specifically. It's because of your abstract that I thought about extending dmipy to spherical harmonics models, which finally led to the implementation of generalized multi-tissue models, enabling implementations like multi-tissue-noddi, which hinge on S0 responses estimated by your work. I have many questions about the applicability of SS3T as well, but perhaps we find the chance to discuss another day :-)

Regarding the clipping, I think that the cut-off of 10 that @thijsdhollander proposes is just a heuristic, only meant to not affect the optimal selection of CSF voxels afterwards. If we just remove voxels above this value we could encounter the case where a noisy dataset has all/most CSF voxels just removed (even if they represent CSF in this particular dataset). I think the clipping solution is therefore safer to avoid unexpected situations (and also add not another tuneable parameter). We could implement the clipping by only adding the line SDM = np.clip(SDM, 0, 10) with a comment referencing the conversion in this merge request #67.

Please pitch in @thijsdhollander if this was also your reasoning of one approach over the other, or @weningerleon if you think there is a better/hybrid solution.

rutgerfick commented 5 years ago

Hi @weningerleon, I'm merging this so at least people can use this function without errors, we can potentially open a second MR for the clipping.

thijsdhollander commented 5 years ago

I'm happy to see you stopping by! For the independent reproducibility of your work you have entirely yourself to thank: you wrote your ISMRM abstract so clearly I could basically use it directly like a cooking recipe ;-)

Thanks! :-)

I've seen / read some bits of your notebooks at some point: definitely very interesting materials. The experiments you showed to investigate the general behaviour and mechanism of the SS3T-CSD are actually not unlike the ones I've been doing myself (and some similar ones I'm hoping to show when I get it written up decently at some point). There's a few more "hidden" subtleties about it, but generally, the way you've been going at implementing and experimenting with these things, shows a lot of insight.

Regarding the clipping, I think that the cut-off of 10 that @thijsdhollander proposes is just a heuristic, only meant to not affect the optimal selection of CSF voxels afterwards. If we just remove voxels above this value we could encounter the case where a noisy dataset has all/most CSF voxels just removed (even if they represent CSF in this particular dataset).

Yep, spot on. I'm myself ultimately driven by making things as robust as they can be, which means thinking all the time about these unusual scenarios that might happen (or actually stumbling upon datasets where such things happen unexpectedly). The obvious cases are eliminating the possibilities for "inf"'s and "nan"'s to appear of course. The other challenge in the response function selection algorithm is that it uses the automated thresholding step on a lot of occasions. This is great, because it's generally very robust. But it's also a challenge, because extreme outliers may mess things up seriously. This is in fact the main reason why the "signal decay metric" came into existence in its current form to begin with. In general, this metric gives you pretty much the same contrast (which is all we need for the algorithm) as something like a mean diffusivity (MD) or ADC map (it's physically of course not the same, but that doesn't matter for the purposes of this algorithm). However, average ADC maps of MD maps based of a tensor fit have one important difference in the order of mathematical operations to compute them: they typically first take the ratio and log of signals, per individual image or image pair, and only average (or fit the tensor) in the end. The unstable thing in there is that ratio; and even worse if the fact that DWIs are generally very noisy. Hence the signal decay metric essentially reverses the core mathematical operators: first average per b-value, then take ratios and logs (and then weight a bit and combine for all b-values, but that's more of an afterthought at that point). That averaging per b-value should already make it much "safer" to compute the ratio (and log) to begin with.

But then it can still go wrong: negative values might appear in the data (e.g. after preprocessing using e.g. FSL's eddy and topup, negative values might be introduced in some voxels), as well as zeros, or generally (accidentally) extremely small values for the DWI data. Hopefully the averaging lowers these chances (and thus saves a lot of voxels, so they can still be considered for the response functions), but weird cases still happen. Extremely low values of a DWI average will make the b-zero to b-non-zero ratio potentially very high. Even then, the log will of course still provide a major next protection (hence why the log is even still included in the signal decay metric): for the log to become an outlier, the ratio has to be quite "outrageous". So chances of this happening are hopefully very slim (but you never know).

When it does happen though, capping at a (completely heuristic indeed) value of 10 is a very safe balance: note that exp(10) = 22026.47. So as we're excluding all inf values after the ratio anyway, we're talking about including (and capping) voxels here where the average DWI intensity is 22026 times smaller than the b=0 intensity, but still higher than 0. And so, the sheer act of capping here doesn't mean much more than protecting the algorithm against an extreme outlier (which matter a lot to the algorithm of course). So lots of reasoning to conclude that it's very safe. ;-)

Checking whether that brings any risk for the subsequent multi-tissue CSD, let's see where @weningerleon's arguments might break down:

Really high values indicate a diffusivity way beyond free water diffusivity.

That's definitely still true.

Finally, at the end of the algorithm, the voxels with the highest SDM are selected as CSF voxels (This is basically what you have also written).

Yep, completely accurate. You could wonder why the algorithm goes through so many steps even. Part of that is of course also identifying the other (GM / single-fibre WM) voxels. But for all tissues, including CSF, most of those steps are to also eventually find out what's a reasonable number of voxels. The automated thresholding on the SDM at various points (and a few other thresholds / steps in between) does most of the work. But in the end, yep, for CSF its all about high SDM voxels. So yes, if the capping actually happens (but chances are very slim: see above) for some of the many voxels that get selected, those voxels are indeed selected. Let's say, incredibly unlikely, that many or even all voxels that end up selected are of this nature. (this will literally never happen; but it's the "worst case" imaginable still)

Thus, including these will lead to an overestimation of CSF-diffusivity.

Yep. Just to make sure we're on the same page: the diffusivity of the CSF response function of course (but I think that's what you're also talking about here).

In the subsequent CSF/GM/WM segmentation, an overestimation of CSF-diffusivity should then result in an overestimation of the CSF-compartment.

Nope, this is where there's a slight mistake in reasoning. It's the opposite actually: if the CSF response function diffusivity is overestimated, the CSF signal fraction (should you optionally actively normalise the compartments to sum to one after multi-tissue CSD) will almost certainly be underestimated; but only really every-so-slightly! The absolute compartment size, that might depend, since lowering some signal in the response will require a slightly higher compartment size to compensate. This is a crucial insight on 2 fronts: realising both the direction of the ever-so-slight bias, but even more realising that it'll be really really tiny too.

Let's figure this out with a simple example; e.g. a scenario with only 2 tissues (let's say WM and CSF) and only 2 b-values (b=0 and some other high b-value: it's only for the higher b-values anyway that the risk to have very low DWI intensities would become thinkable). Let's also ignore the angular domain; i.e. simply treat both WM and CSF as isotropic with a single intensity for b=0 and another single intensity for the other b-value. WM would have a (very) low diffusivity, and CSF a very high one. For just 2 (isotropic) tissues and 2 b-values, we'll have a simple square 2x2 system to fit to 2 measurements, so we'll have a single exact solution if it can fit the data. This WM-CSF model will be able to fit any (2 measurements) data of which the diffusivity lies between the WM and CSF response functions' diffusivities (with non-negative compartments, that is); i.e. beyond that range of diffusivities on either end, we'll have a residual (again, non-negative compartments), and within the range the fit is exact. Let's say you have an actual CSF voxel, so it signal is the same contrast (diffusivity) as the CSF response. We'll get a certain CSF compartment fit, and the WM compartment will be zero. Now think about that same (actual) CSF signal, but imagine the CSF response function's diffusivity is overestimated. This in fact extends the range of signals that can be fit. The "actual" CSF voxel's fit for these 2 response functions will be one that needs to engage the WM compartment at least a bit, because the CSF response functions' diffusivity is overestimated, and only the WM compartment in the fit has a lower diffusivity "to compensate" in the fit. Consequence: CSF is underestimated, and WM is overestimated. That's the "direction" of the bias explained. Wrt the magnitude of it though, you can get a feeling for this when realising we're fitting the intensities directly; i.e. we're not first taking some ratio and a log and then fitting diffusivities or something (akin to what an unweighted LLS DTI fit would typically do for instance). Also realise that the b=0 signal for CSF is much larger than the DWI intensity for CSF. Also realise that when overestimating the diffusivity of CSF via manipulating the DWI intensity for CSF, we can only lower it so much (and still stay above zero) in magnitude. That magnitude will still be very small compared to the b=0 intensity's magnitude. Here's a WM (blue) and CSF (red) response at b-values 0 and 2800 from an actual dataset to put that into (visual) perspective:

response_example

Note that here the CSF response diffusivity was actually still underestimated: about 0.0015 mm^2/s here. Not unusual at a high b-value like this, due to Rician bias (signal hitting the noise floor). Using a voxel with an SDM of 10 or more, means virtually moving the end of that red line down a bit, but of course not beyond 0: i.e. the response barely changes; but yes, ultimately when approaching 0 really closely, the diffusivity will explode, even up to infinity should the intensity hit zero. But to the overall response, and the set of responses in general, this means almost nothing. ;-) In a weird twist of sorts, you'd actually be happy to encounter such voxels (even though you won't), as they might accidentally compensate the Rician bias in the CSF response ever so slightly. But most importantly in the end: the "contrast" in the CSF response is mostly determined by a large b=0 intensity, and a non-b=0 intensity that will always be very low in comparison. The b=0 intensity, luckily, is very robust (as it's high SNR, in particular for CSF); and even more so if there's a few b=0 images. And finally, we're estimating this response for most datasets in the end for a few hundreds of voxels.

So a long explanation (sorry for all the reading I've caused here ;-)) to conclude that it's quite safe to have the SDM cap at 10; but getting an algorithm like this to be robust in all circumstances is ultimately a challenge of imagining what can go wrong in extreme (and extremely unlikely) cases.

rutgerfick commented 5 years ago

Wauw @thijsdhollander , what a wonderful explanation of the intricacies of your algorithm. It's these sorts of discussions that are important to enable high quality, reproducible research. I'll incorporate these details of the tissue response estimation in a notebook when I have some time.

@weningerleon , would you mind making a small second MR that adds the clipping in #68 ? Perhaps you can also add a warning when the clipping actually occurs so it's not completely silent.

thijsdhollander commented 5 years ago

No worries; it's always good to spread some of these insights!

weningerleon commented 5 years ago

@thijsdhollander, wow, thank you so much for this explanation! I'm quite new to diffusion imaging, and apparently, I still had a mistake in my reasoning there. So participating in this discussion really helped me!

@rutgerfick, sure, I'll make a second MR with the clipping :) However, I'm on holiday at the moment, it will take a few days until I can make the changes. I hope that's fine?

rutgerfick commented 5 years ago

No problem @weningerleon :-) Enjoy!