Open jdtournier opened 1 month ago
clang-tidy review says "All clean, LGTM! :+1:"
I've tested this on the Workshop data (single subject Siemens). On these data too, the denoised data and noise level maps are nearly identical. My timings on these data are similar to yours:
The JacobiSVD in double precision should have the best numerical stability of all tested methods. I therefore calculated the error of the current implementation and the BDC-SVD implementation w.r.t. this baseline. For the noise level, the mean absolute and relative errors are:
This means that the result of BDC-SVD is one order of magnitude more accurate than the current EVD-based implementation in single precision. The EVD in double precision leads to an absolute error that is exactly 0, but this isn't a fair comparison because the input data are in single precision. These results confirm the theory I seem to remember from my numerical linear algebra class ages ago :wink:
Based on this, I'd be in favour of switching to BDC-SVD. I think the longer run time is a fair price to pay for accuracy in single precision, and it's not that much longer than using the current implementation in double precision. I'd then keep float32 as the default, except perhaps when the input data already has datatype float64.
Given the relatively small differences in code, would be not too difficult to template by decomposition method and provide a command-line option.
I've made a change to the code to avoid calculating the full (thin) matrix V. Only the central column of S.V' is needed to recombine the data, so we can take advantage of that to save some computation time. I've tested this for cases M
The run time is ~20% faster:
clang-tidy review says "All clean, LGTM! :+1:"
This changes the mode of operation from our current fast, but less numerically stable eigenvalue decomposition approach to the fast bidiagonal divide & conquer SVD implementation now provided with Eigen 3.4. Seems to work on my limited testing, producing results that are indistinguishable (though not identical) from the current implementation on my test data (BATMAN tutorial data).
However, this comes with a considerable additional computation cost. On my system (Arch Linux, AMD Ryzen 9 3950X 16-Core Processor, 64GB RAM), these are the runtimes:
Thoughts welcome.