No lmax=8 cap in dwi2fod msmt_csd

thijsdhollander commented 7 years ago

@bjeurissen @jdtournier: with the introduction of the lmax=10 default for responses, I just stumbled upon the fact that dwi2fod msmt_csd is actually not capping the lmax at 8, but seems to go by the (max?) lmax of the response function(s) provided...? I suppose this may not be the behaviour we want (as a default at least)? When tag_0.3.16 gets merged, most users that don't touch -lmax anywhere, would automatically end up with an lmax=10 MSMT-CSD.

Lestropie commented 7 years ago

I thought that was done in #862, so already updated the docs accordingly.

thijsdhollander commented 7 years ago

Nope, dwi2fod was never touched in that one. I always assumed it was by default capped at lmax=8, even if the data and/or responses "allowed" for lmax > 8, in line with the NMR Biomed findings.

jdtournier commented 7 years ago

Yep, you're right. The standard CSD will default to l_max=8 no matter what the response l_max is, while multi-tissue CSD still defaults to whatever the response l_max is. We should fix that to make them consistent, and I assume that since that's what's already been stated in the docs, everyone is happy to set the default to l_max=8?

thijsdhollander commented 7 years ago

Yes, certainly. I also noticed at the moment it even seems to ignore the number of directions in the data. In principle, we're playing with 3 pieces of (potential) input information here, and there has to be an operator of sorts to combine them:

the number of directions (images) in the data, that translates into an lmax that it "supports" without further constraints/information; e.g., 30 directions supports lmax=6 without super-resolving stuff
the lmax of the response function(s)
a "constant" preferred max lmax (e.g. 8)

If I'm correct (correct me otherwise): currently the "plain" CSD goes for the minimum of these 3, right?

Also, again if I'm correct, currently MSMT-CSD just goes for the lmax of the response functions, and ignores the other two. I've just run one with lmax=10 responses on a 30 direction (+ 1 b-zero) dataset, and ended up with lmax=10 WM FODs.

I fully agree lmax=8 should definitely be the default "constant" cap, even if lmax > 8 responses are provided.

The remaining question is then: how to use (or not use) the number of images? It's not as trivial as with single-shell CSD: some directions may be the same across shells (so it may be harder to argue they contribute unique angular information in all scenarios...?), and the b-zero images contribute no additional angular information at all.

jdtournier commented 7 years ago

If I'm correct (correct me otherwise): currently the "plain" CSD goes for the minimum of these 3, right?

Yes, that's the logic in a nutshell.

Also, again if I'm correct, currently MSMT-CSD just goes for the lmax of the response functions, and ignores the other two.

Yep, it certainly seems like it to me too.

The remaining question is then: how to use (or not use) the number of images? It's not as trivial as with single-shell CSD: some directions may be the same across shells (so it may be harder to argue they contribute unique angular information in all scenarios...?), and the b-zero images contribute no additional angular information at all.

Good point. Although I'd argue it is as trivial as the single-shell case, since I don't think you can really ever argue that the different shells complement each other's angular resolution (at least not in the general sense - it may apply in MT-CSD and other models since we link across shells via the response function). It sounds similar to the argument made for the Caruyer multi-shell direction optimisation, and I don't agree with it. The way I see it, it's a bit like saying that the multiple rows on a 2D-FT (cf. shells) contribute more resolution along the x-axis (cf. angular domain) if you offset/jitter their sampling positions along x (cf. orientations) so that each row (cf. shell) samples a slightly different set of x positions (cf. orientations). If the signal is band-limited and fully sampled, then this would provide strictly no additional information for the 2D-FT case - and I don't think it would add any extra information for the orientation case either.

So I think it would be fine to use the same logic as for the single-shell CSD case, but using the maximum number of directions, either in the highest b-value shell, or in any single shell (the latter being potentially slightly less robust in the unlikely case that the dataset contains more b=0 or low b volumes than in the outer shells).

thijsdhollander commented 7 years ago

So beforehand already: I do agree with the final proposal of...

...using the maximum number of directions, either in the highest b-value shell, or in any single shell (the latter being potentially slightly less robust in the unlikely case that the dataset contains more b=0 or low b volumes than in the outer shells).

...as a good solution to have something robust (in the conservative sense) in there. Maybe the max number of directions in any single shell that is not the b=0 one, to avoid that latter case up to a certain extent. Or indeed maybe just the number of directions in the highest b-value shell to be even more safe / conservative. :+1:

Apart from that, I'm also not generally going with the Caruyer argument, but for different reasons I'd say. We do still have...

it may apply in MT-CSD and other models since we link across shells via the response function

...which certainly does let independent samples contribute to 1 final FOD. There I feel the 2D-FT analogy is not entirely analogous enough with this case: we're looking to reconstruct angular information on only 1 sphere (the FOD's domain) in the end. So it's still an n-D to (n-1)-D transformation (whereas 2D-FT is 2D to 2D). The difference being that in our case all transformations are different (the response (de)convolution transformation for each shell's different response); so the information of each shell can indeed not be regarded as equivalent (the lower b-value shells' data are lower contrast; their response is smoother; that individual deconvolution would be more ill-posed; hence these shells contribute less angular information; the extreme case being b=0 that contributes no angular information). So the analogy of sorts would be a set of 1D-FTs with different data (and different numbers of input data points) but also a different "FT", i.e., another "T" altogether (with other conditioning properties), yet all data-and-"T"-combinations with the same intended outcome. (in our case: the FOD)

But since the combined contributions of these shells can't indeed just be reasoned about as the sum of the numbers of directions, it's still not a good idea to (Caruyer-wise) sacrifice in any way an as homogeneous distribution as possible for each individual shell, so as to obtain an overall homogeneous angular coverage.

Well, all stuff for a dead moment at ISMRM (if those exist). :smile:

But for this issue: as mentioned, I agree with a simple conservative solution as the default. That's definitely the most transparent to most "casual" users too. Anyone who wants to go beyond and has good reasons to, can use the -lmax option and should be clever enough to find it.

jdtournier commented 7 years ago

Glad we agree on the best default here. But on second thoughts, that's probably still not quite enough. We implemented a method to improve robustness to poorly conditioned DW schemes based on the condition number of the amplitude->SH transform a while back. Neither versions of CSD currently use that, but I think they should... It should be trivial to wrap a M = DWI::compute_SH2amp_mapping (...) with a lmax_data = Math::SH::LforN (M.cols()) to get a more robust estimate. We could run this for each b≠0 shell and pick the highest estimated l_max from that lot. What do you reckon?

On the topic of different shells contributing angular resolution: my point is that if the signal is band limited (and it is) and sufficiently sampled in each shell (not necessarily always true, but pretty close in practice), then it doesn't matter how the directions in one shell are distributed relative to those in the other - provided they are optimally distributed within each shell - the Nyquist criterion is satisfied in each shell. The analogy with 2D-FT is useful to show that shifting rows relative to each other only introduces a phase ramp in the reciprocal domain, but does not otherwise introduce additional information (it's the reason why spatial super-resolution doesn't work in-plane in MRI, only through-plane). Only when the signal is under-sampled would it be possible to exploit this idea, provided the model used allows linking across shells (for example with MT-CSD). And even then it's a hard sell, it relies on the signal within e.g. the lower shell contributing frequency information to help constrain the fit in the higher shells, which seems a stretch given that they will typically contain lower l_max components.

But not that it's a big deal, I think we all agree that the concept isn't ideal...

thijsdhollander commented 7 years ago

Ok, forgot about this for a while, but started thinking about it again this week. The scenario described somewhere in a previous post in this thread is actually not completely true, even for the current dwi2fod csd algorithm. I'll reiterate and correct here, so we're clear on the current state of things. Apart from the explicit -lmax option, which overrides any defaults, there are currently 3 mechanisms that determine a default. Not all of them are used for msmt_csd, but not all of them are used as well for current CSD.

Overview (of current reality):

The number of directions in the DWI dataset, that translates into a supported lmax (e.g. 30 dirs supports lmax=6): used by csd; not used by msmt_csd
lmax of the response function: NOT used by csd; used by msmt_csd per individual tissue type
hardcoded max lmax=8: used by csd; not used by msmt_csd

CSD uses the minimum of what mechanisms 1 and 3 suggest, MSMT-CSD uses mechanism 2. Note that CSD does not use mechanism 2 anywhere, which I only just noticed now after doing a quick test with a dataset that supported lmax=8, yet with a response function that only had lmax=4. The result was still an FOD image with 45 coefficients, aka lmax=8.

It's also clear that MSMT-CSD always needs at least mechanism 2 as part of its list of mechanisms: in the common scenario of WM-GM-CSF, the latter two tissue types have an lmax=0 response, which is the default mechanism of telling the algorithm we are also only requesting an lmax=0 FOD.

So to at least make things consistent between CSD and MSMT-CSD, mechanism 1 and 3 need to be added to MSMT-CSD, and mechanism 2 should probably also be added to CSD. As to adding mechanism 1 to MSMT-CSD, I'd vote to go with the number of directions in the highest b-valued shell to calculate the data-supported lmax. It's simple and clear, and probably that shell is the one which will be mostly responsible for providing the necessary angular frequencies for the higher lmax'es, such as 8.

These changes are, I think, the least we should certainly require to cap the default lmax appropriately. Next, there is the question of whether to go further and use the condition number of the SH transform. In theory, this makes sense. However, after thinking now quite long about it, the scenario that bugs me the most, should a default based on the condition number be implemented, is that this has the potential to lead to FOD images in a population that have different lmax. It may sound less likely, but it's entirely possible that a certain set of directions may find itself close to the threshold based on the SH transform condition number. However, in our population analyses, we always start off of course by performing motion correction, which essentially nudges the gradient directions a bit back and forth. I can imagine a study with 45 (or maybe 46) direction datasets (which is already the theoretical limit for lmax=8) that due to motion correction in individuals leads to some having a default suggestion of lmax=6. That, in an automated pipeline, is a nightmare. The obvious solution is to explicitly set the -lmax, but then the benefit of the whole default mechanism is essentially discarded.

Still, there is value in what the condition number may suggest to a user (especially if they have very badly distributed directions). So I just came up with this idea: why don't we go as the default just with the above 3 criteria (which are easy to understand, communicate, and will never lead to unexpected behaviour for a range of more casual users out there), but we do print out a warning should the condition number suggest a stricter (lower) lmax. In the aforementioned doom scenario, by default everything would run with lmax=8 (which will probably be more than safe enough in that scenario), yet some subjects' processing will print out this warning. If needed, that can then potentially trigger the concern of the user, upon which they may wisely decide to go with the safest (lowest) lmax suggestion provided by the warning, so they end up running potentially again with -lmax 6 if they wish so. But at least by default, they don't end up with mixed lmax'es. This will make our (at least my) software-support and -teaching life a lot easier. (and I and my invisible friend OCD can sleep well at night :relieved: )

Apart from this, I tried to just look at the code, and I can spot bits of all the above logics, yet it's a bit spread around (and I don't want to risk spending a lot of time on essentially potentially breaking it a few times)... Who's currently the most familiar with these (and of course has the time that we all don't have :grin: )? I would've thought @jdtournier and @bjeurissen , yet these bits of code have been refactored quite a bit (pushed more towards the back-end) during the lifetime of tag_0.3.16...

thijsdhollander commented 7 years ago

(Forgot in the above, but another reason a condition-number based automated default bugs me slightly, is that the threshold chosen for it also remains a bit arbitrary... even though I do admit there may be sensible choices. But it'd be better to accompany my above suggested warning message with the actual condition number.)

jdtournier commented 7 years ago

Quick response, since I'm on leave currently 😎 ...

I also think a hard-coded default of l_max = 8 or maximum in response function (whichever is lower) for both CSD & MSMT-CSD is probably the simplest / safest. It will potentially trigger super-resolution when the number of directions falls below the required minimum, but as far as CSD / MSMT-CSD are concerned, it makes no difference whatsoever. And I reckon when processing datasets with a low number of directions, it still makes sense to estimate the FOD to l_max = 8, any lower and the tractography starts to look really messy...

So that would simplify the decision process somewhat. As to providing a warning about the condition number, I think the best place for that is actually at the response function estimation stage (which I think was already the case, at least on master?). Not sure how that is handled within the new amp2response...?

So I'd be happy to ditch method 1 (based on number of directions in DWI). I think that matters for the response function estimation, and for linear SD, but for CSD it doesn't really make sense. And it's fragile, unless we use the condition number approach, which itself is somewhat subjective in the thresholds used, as pointed out by @thijsdhollander.

thijsdhollander commented 7 years ago

I've added an lmax=8 cap (default extra cap when -lmax is not specified) for MSMT-CSD on tag_0.3.16, but didn't change anything further currently. That essentially fixes MSMT-CSD with the upcoming merge in mind: it's now the minimum (per tissue type) of the provided response and lmax=8; unless of course -lmax is specified, then that one overrides everything. Simple and clear, and avoids a tricky definition based on the data, which is, as we've discussed above, very hard to define, particularly for the multi-shell case and in all imaginable scenarios... and even if defined, depends on an arbitrary threshold somewhere somehow.

For the basic CSD, nothing changed currently. I'm not sure you'd want to get rid of the "data-driven" lmax cap there entirely...? In that scenario, I would argue it can actually still make sense as an additional safe default cap. I.e., if you've got only 30 directions or something, an automated lmax=6 still sounds reasonable (and can still on purpose be overridden via e.g. -lmax 8 if you explicitly intend super-resolution). On the other hand, I agree that FODs lower than lmax=8 probably serve little purpose for a lot of applications (iFOD2 in particular is indeed a good example)... I'll leave this up to you and/or other people... there's good arguments either way.

Note a warning about e.g. the condition number currently makes little sense any more at the dwi2response stage. The new amp2response has an lmax=10 flat default: that's only 6 ZHS coefficients to estimate, and typically from at least several tens of voxels (at once; so a very, very, very safely over-defined system with now even extra non-negativity and monotonicity constraints on top of that). Even amazingly crappy data supports that. :smile: (I've actually tried this: still looks like an awesome lmax=10 response from 12 direction data!)

So currently (current state on tag_0.3.16 that is) a default pipeline (no -lmax provided) looks like this:

CSD: dwi2response tournier gives an lmax=10 WM response, dwi2fod csd provided with that response gives an lmax=8 FOD, unless the data supports less (e.g. 30 directions results in lmax=6 FOD)
MSMT-CSD: dwi2response dhollander (or msmt_5tt; though wouldn't recommend it myself anymore) gives an lmax=10 multi-shell WM response and lmax=0 multi-shell GM and CSF responses, dwi2fod msmt_csd provided with that triplet of responses gives an lmax=8 WM FOD and lmax=0 GM and CSF "(FO)Ds". Fiddling with data-driven defaults here is not a good idea: this affects more than just the angular quality (smoothness) of the WM FOD. It also changes what the WM compartment can handle/model, and hence the balance between it and the GM and CSF compartments too (I've got good cases to convince myself of that). If you've got only e.g. 30 directions on the outer shell, you should probably question what kind of weird acquisition decisions you've just made for a multi-shell protocol. :grin:

jdtournier commented 7 years ago

For the basic CSD, nothing changed currently. I'm not sure you'd want to get rid of the "data-driven" lmax cap there entirely...?

I do want to get rid of it. I think it's too fragile - for example, there's no checks that the e.g. 36 DWI volumes aren't 3 repeats of the same set of 12 directions. And I do think an l_max=8 FOD is more useful generally. I'm also wary of the situation you alluded to previously, whereby users might have culled a variable number of corrupted volumes from their datasets, and might therefore end up with FODs reconstructed with different l_max values - it's not an unlikely scenario.

The new amp2response has an lmax=10 flat default: that's only 6 ZHS coefficients to estimate, and typically from at least several tens of voxels (at once; so a very, very, very safely over-defined system with now even extra non-negativity and monotonicity constraints on top of that). Even amazingly crappy data supports that.

Fully agree :+1:

thijsdhollander commented 7 years ago

I agree with all points. I actually hadn't fully appreciated the culling of corrupted volumes scenario. What I reasoned somewhere above was only a scenario like e.g. motion correction, where the condition number would then not be consistent across subjects in a group (study). But a cap simply driven by numbers of directions I still saw as a safe option (even though it doesn't cover everything, but it felt "safer than nothing"). However, removing corrupted volumes is indeed a realistic scenario, and there even a cap driven by numbers of directions/volumes would indeed end up being inconsistent. That said, I fully agree now with getting rid of it. :+1:

I won't have the time to look into it before ISMRM any more... if anyone feels like fixing it up, don't hesitate. Essentially:

MSMT-CSD is fine now. Don't touch it. :stuck_out_tongue:
CSD currently has an lmax=8 cap, as well as a directions-number driven cap. And -lmax to override anything of course. What needs changing to make it consistent with MSMT-CSD:
- Remove the directions-number driven cap
- Introduce a cap based on the lmax of the provided response function
- Make sure it interacts correctly with the lmax=8 cap (the lower of the latter two applies)
- Make sure -lmax overrides it all
- Be careful e.g. about the scenario where a provided -lmax setting is higher than the lmax of the response.

Inspiration for all of this can actually be found in the MSMT-CSD code. The mechanism should essentially become the same, with the difference that MSMT-CSD applies it per tissue type's response or -lmax setting.

jdtournier commented 7 years ago

OK, just had a go at changing the l_max handling - I think it does what it should now...

Only slight concern is this statement:

Be careful e.g. about the scenario where a provided -lmax setting is higher than the lmax of the response.

What is the problem here...? That simply invokes super-resolution, just as before. The response vector gets padded with zeros, there's nothing to worry about as far as I can tell...? In any case, the command runs fine in this regime, and produces the expected output...

thijsdhollander commented 7 years ago

The response vector gets padded with zeros, there's nothing to worry about as far as I can tell...?

That basically. 😄 So all is well. 👍

I've briefly looked at the commit, it looks all good to me!

MRtrix3 / mrtrix3

No lmax=8 cap in dwi2fod msmt_csd #899