dwidenoise: try using bidiagonal divide and conquer SVD

jdtournier commented 1 month ago

This changes the mode of operation from our current fast, but less numerically stable eigenvalue decomposition approach to the fast bidiagonal divide & conquer SVD implementation now provided with Eigen 3.4. Seems to work on my limited testing, producing results that are indistinguishable (though not identical) from the current implementation on my test data (BATMAN tutorial data).

However, this comes with a considerable additional computation cost. On my system (Arch Linux, AMD Ryzen 9 3950X 16-Core Processor, 64GB RAM), these are the runtimes:

current implementation (float): ~26s
current implementation (double): ~49s
this implementation (BDC-SVD, float): ~1m03s

Thoughts welcome.

github-actions[bot] commented 1 month ago

clang-tidy review says "All clean, LGTM! :+1:"

dchristiaens commented 1 month ago

I've tested this on the Workshop data (single subject Siemens). On these data too, the denoised data and noise level maps are nearly identical. My timings on these data are similar to yours:

current implementation (float): ~26s
current implementation (double): ~49s
JacobiSVD (float): ~4m45s
JacobiSVD (double): ~10m16s (baseline)
BDC-SVD (float): ~1m11s
BDC-SVD (double): ~2m14s

The JacobiSVD in double precision should have the best numerical stability of all tested methods. I therefore calculated the error of the current implementation and the BDC-SVD implementation w.r.t. this baseline. For the noise level, the mean absolute and relative errors are:

current implementation (float): 2.7e-3 +/- 4.5e-2 (absolute) and 3.6e-5 +/- 5.7e-4 (relative)
BDC-SVD (float): 1.2e-4 +/- 9.8e-3 (absolute) and 1.2e-6 +/- 1.0e-4 (relative)

This means that the result of BDC-SVD is one order of magnitude more accurate than the current EVD-based implementation in single precision. The EVD in double precision leads to an absolute error that is exactly 0, but this isn't a fair comparison because the input data are in single precision. These results confirm the theory I seem to remember from my numerical linear algebra class ages ago :wink:

Based on this, I'd be in favour of switching to BDC-SVD. I think the longer run time is a fair price to pay for accuracy in single precision, and it's not that much longer than using the current implementation in double precision. I'd then keep float32 as the default, except perhaps when the input data already has datatype float64.

Lestropie commented 1 month ago

Given the relatively small differences in code, would be not too difficult to template by decomposition method and provide a command-line option.

dchristiaens commented 1 month ago

I've made a change to the code to avoid calculating the full (thin) matrix V. Only the central column of S.V' is needed to recombine the data, so we can take advantage of that to save some computation time. I've tested this for cases MN; the output is identical to Donald's BDC-SVD implementation. I've also tried calculating only the non-truncated part of this vector, but this had no significant effect on the run time and would impede adding other shrinkage functions later.

The run time is ~20% faster:

previous BDC-SVD (float): ~1m11s
previous BDC-SVD (double): ~2m14s
new BDC-SVD (float): ~57s
new BDC-SVD (double): ~1m51s

github-actions[bot] commented 1 month ago

clang-tidy review says "All clean, LGTM! :+1:"

MRtrix3 / mrtrix3

dwidenoise: try using bidiagonal divide and conquer SVD #2906