ANTsX / ANTs

Advanced Normalization Tools (ANTs)
Apache License 2.0
1.19k stars 380 forks source link

DenoiseImage is unreasonably slow #454

Closed gdevenyi closed 6 years ago

gdevenyi commented 7 years ago

Running:

DenoiseImage -d 3 -i input.nii.gz -n Rician -o denoise.nii.gz --verbose

Seems to take an excessive amount of time.

In the meantime I can do:

> nii2mnc input.nii.gz input.mnc
> minc_anlm --rician input.mnc denoise.mnc
> mnc2nii denoise.mnc denoise.nii.gz

While DenoiseImage makes about 5% progress, sharing the same CPUs.

All the MINC commands are single threaded to top it off.

I think there may be a very significant bottleneck somewhere.

Square difference map between DenoiseImage and minc_anlm: image

ntustison commented 7 years ago

Perhaps you should do some investigation. There are a number of factors which could potentially be the cause, e.g.,

Please let us know what you find.

gdevenyi commented 7 years ago

Answers:

Since search radius is different, test was re-run, marginal increase in speed, but still massively slow in comparison.

MINC-based script timings/resource usage:

    Command being timed: "./test.sh 20170313_100217T1ws002a1001.nii.gz 20170313_100217T1ws002a1001.convert.denoise.nii"
    User time (seconds): 218.33
    System time (seconds): 2.24
    Percent of CPU this job got: 83%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 4:24.61
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 1655544
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 2
    Minor (reclaiming a frame) page faults: 461868
    Voluntary context switches: 2803
    Involuntary context switches: 34796
    Swaps: 0
    File system inputs: 648
    File system outputs: 326072
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

DenoiseImage

    Command being timed: "DenoiseImage -d 3 -r 2 -i 20170313_100217T1ws002a1001.nii.gz -n Rician -o denoise.nii.gz --verbose"
    User time (seconds): 23607.77
    System time (seconds): 12.36
    Percent of CPU this job got: 607%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 1:04:45
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 986676
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 247920
    Voluntary context switches: 4252
    Involuntary context switches: 3585308
    Swaps: 0
    File system inputs: 0
    File system outputs: 171128
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0
ntustison commented 7 years ago

There were a couple issues but I don't know if that accounts for what you're seeing. If not, perhaps it's some ITK overhead or something particular to your system.

vfonov commented 7 years ago

minc_anlm uses patch preselection based on the patch mean and variation, see: https://github.com/BIC-MNI/EZminc/blob/master/mincnlm/minc_anlm.cpp#L604 That speeds things up.

vfonov commented 7 years ago

You can also check the speed on reimplementaton using ITK called itk_minc_nonlocal_filter - it doesn't use preselection by default, source code is here: https://github.com/vfonov/patch_morphology/tree/master/src (it's included in minc-toolkit-v2)

ntustison commented 7 years ago

Thanks @vfonov. I believe we have the same check. Perhaps the parameters differ causing variation in pass-through.

ntustison commented 7 years ago

Perfect. I hadn't seen that ITK reimplementation.

vfonov commented 7 years ago

it was an experiment.

gauvinalexandre commented 7 years ago

I concur with @gdevenyi findings. I have build ANTs yesterday, tag v2.2.0, which is before the issues solving @ntustison mentioned, in release mode with ITK and VTK fetched by CMAKE.

By curiosity, I launched the denoising script on my 2mm isotropic, 64 b=1000 directions Brain DWI (which is very small compared to what I want to denoise in the future, 1mm iso, 100 directions...). After 20 hours of computation time on my 8 last generation i7 cores, I reached a 50% progression with this command:

DenoiseImage -i dwi.nii.gz -d 4 -o dwi_ants_denoised.nii.gz -n Rician -x dwi_brain_mask.nii.gz -v 1
Running for 4-dimensional images.
*********10*********20*********30*********40*********50

I thought it would be nice to let you know. I can't use your script for now, unfortunately, but I'll stay in tune for further advancements! If those issues get resolve, I could try again building the master. Thanks!

ntustison commented 7 years ago

Thanks. I'll take a look. My guess is that it has something to do with ITK overhead.

vfonov commented 7 years ago

Hello,

I can't claim that I completely understand the code, but looking at https://github.com/stnava/ANTs/blob/master/Utilities/itkAdaptiveNonLocalMeansDenoisingImageFilter.hxx#L363 makes me think that you implemented Blockwise algorithm from Jose with maximum block overlap (making it much slower then my implemnetation), i.e http://www.irisa.fr/vista/Papers/2008_TMI_Coupe.pdf with block stride of 1.

Also, in case of 4D images, it looks like filter is treating 4th dimension in the same way as spatial dimensions, which might be reasonable for fMRI scans, but probably not for DTI scans, where volumes are acquired with different gradient directions in arbitrary order. The time of execution here is ~ S P V B. where S is volume of search area, P - volume of patch and V - volume of the input file, B - is the block volume in the 4D case it would take ~ s^4 p^4 ( x y z t ) * p^4 time. where s- search radius and p - patch radius.

gauvinalexandre commented 7 years ago

On another note, i set the itk nb of processing threads environment variable to 4 (half my nb of virtual cores) as suggested on the installation instructions.

Also, there would be no problem sharing my dataset for tests on Dropbox if you need!

gauvinalexandre commented 7 years ago

Interesting @vfonov! Perhaps DWI (not DTI) scans should be treated as 3D data, using the given dimensionality parameter.

hjmjohnson commented 7 years ago

For DWI you may also want to look at a slicer module that I have as a stand alone package:

https://github.com/BRAINSia/JALMMSE

Joint anisotropic LMMSE for DWI denoising   <![CDATA[This module is a reimplementation of the older 'DWI Joint Rician LMMSE Filter'. There are two main differences with this former approach:\n   1) Instead of computing sample moments inside isotropic neighborhoods, we use a non-local means-like scheme to average only those voxels silmilar enough to the central voxel. This similarity is based on three RGB channels corresponding to the projections of the DWI data set in three mutually orthogonal spatial directions.\n   2) The standard deviation of noise in the complex domain, sigma, is estimated here as the mode of the histogram of the (corrected) local variances in the signal area, which is a more robust estimation procedure.\n This module reduces Rician noise on a set of diffusion weighted images. For this, it filters the image in the mean squared error sense using a Rician noise model. The N closest gradient directions to the direction being processed are filtered together to improve the results: the noise-free signal is seen as an n-dimensional vector which has to be estimated with the LMMSE method from a set of corrupted measurements. To that end, the covariance matrix of the noise-free vector and the cross covariance between this signal and the noise have to be estimated, which is done taking into account the image formation process. All the estimations are performed as sample estimates in a 'shaped neighborhood' defined by the weights extracted from the structural similarity of the voxels.\nA complete description of the isotropic algorithm may be found in:\nAntonio Tristan-Vega and Santiago Aja-Fernandez, 'DWI filtering using joint information for DTI and HARDI', Medical Image Analysis, Volume 14, Issue 2, Pages 205-218. 2010.\nThe anisotropic method is further described in:\nAntonio Tristan-Vega, Veronique Brion, Gonzalo Vegas-Sanchez-Ferrero, and Santiago Aja-Fernandez, 'Merging squared-magnitude approaches to DWI denoising: An adaptive Wiener filter tuned to the anatomical contents of the image', In Proceedings of IEEE EMBC 2013.]]>     http://wiki.slicer.org/slicerWiki/index.php/Documentation/4.1/Modules/JointRicianLMMSEImageFilter     Antonio Tristan Vega (University of Valladolid, Spain)   <![CDATA[Work partially funded by grant numbers TEC2010-17982 from the Ministerio de Ciencia y Educacion (Spain) and VA376A11-2, SAN103/VA40/1 from the Junta de Castilla y Leon (Spain).]]>           <![CDATA[Parameters for Joint Rician LMMSE image filter]]>    

Hans

--

From: "notifications@github.com" notifications@github.com Reply-To: stnava/ANTs reply@reply.github.com Date: Thursday, August 3, 2017 at 10:30 AM To: "nipype@noreply.github.com" ANTs@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [stnava/ANTs] DenoiseImage is unreasonably slow (#454)

Interesting @vfonov! Perhaps DWI (not DTI) scans should be treated as 3D data, using the given dimensionality parameter.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

vfonov commented 7 years ago

Yes, I meant DWI.

Pierrick also made extension for DWI: https://sites.google.com/site/pierrickcoupe/softwares/denoising-for-medical-imaging/dwi-denoising

ntustison commented 7 years ago

I can't claim that I completely understand the code, but looking at https://github.com/stnava/ANTs/blob/master/Utilities/itkAdaptiveNonLocalMeansDenoisingImageFilter.hxx#L363 makes me think that you implemented Blockwise algorithm from Jose with maximum block overlap (making it much slower then my implemnetation), i.e http://www.irisa.fr/vista/Papers/2008_TMI_Coupe.pdf with block stride of 1.

@vfonov Ah, thanks. That sounds familiar and why I asked @gdevenyi about subsampling. I think I saw that non-unit stride in Jose's code and implemented the shrink factor option as a hack. Can you show me in the minc code where that's done so I can investigate further? Thanks.

vfonov commented 7 years ago

So, I didn't impelement block mode in my ITK code. In the non-itk version it is done in https://github.com/BIC-MNI/EZminc/blob/master/mincnlm/minc_anlm.cpp#L566 - note that indexes are incremented by 2 - that implements hard-coded block stride of 2 (i.e speedup of 8 for denoising).

ntustison commented 7 years ago

Yep, that's what I was looking for. Thanks.

ntustison commented 7 years ago

@gdevenyi --- What factor speed-up do you see between the two implementations?

gdevenyi commented 7 years ago

@ntustison my bad, I didn't catch that there was a hard-coded subsampling in the code

gdevenyi commented 7 years ago

MINC Based, including convert to-from: Elapsed (wall clock) time (h:mm:ss or m:ss): 0:4:24.61 DenosieImage: Elapsed (wall clock) time (h:mm:ss or m:ss): 1:04:45

vfonov commented 7 years ago

$ time minc_anlm mc_fonv7706.2007-06-14.Z-50_nt1g.mnc.gz test_anlm.mnc Calculating means... Image maximum:709 Calculating variances... Launching: 1 thread(s)... Done...

real 0m36.928s user 0m36.768s sys 0m0.080s

4 Threads: $ time itk_minc_nonlocal_filter --preselect --anlm --flat mc_fonv7706.2007-06-14.Z-50_nt1g.mnc.gz test_itk_anlm.mnc --clob 0% ...............................................................................................................................................................................................................................10% ...................20% ...................30% ...................40% ...................50% .....................60% ...................70% ...................80% ...................90% ....................100% real 0m51.185s user 3m0.368s sys 0m0.264s

$ time DenoiseImage -n gaussian -d 3 -i mc_fonv7706.2007-06-14.Z-50_nt1g.mnc.gz -o test_denoise.mnc

real 4m27.495s user 14m46.420s sys 0m0.652s

ntustison commented 7 years ago

Thanks all. I've made some changes and will commit when I get back from my run.

ntustison commented 7 years ago

https://github.com/stnava/ANTs/commit/3b46c124b4c75626bf8c5535eb596b2521998ccf

abiancardi commented 7 years ago

Hi, I just compiled the updated version of DenoiseImage and I am experiencing a segmentation fault, but only with some images. For instance, the program runs just fine the examples mentioned in the Insight Journal paper, but it fails on the CT scan from the 3DSlicer sample data collection (CT-chest)

Here is an excerpt from a run with gdb where the problem is said to occur at line 297:

Running for 3-dimensional images.

[New Thread 0x2aaab4b21700 (LWP 25480)]
[Thread 0x2aaab4b21700 (LWP 25480) exited]
[New Thread 0x2aaab4b21700 (LWP 25481)]
[Thread 0x2aaab4b21700 (LWP 25481) exited]
[New Thread 0x2aaab4b21700 (LWP 25482)]
[Thread 0x2aaab4b21700 (LWP 25482) exited]
[New Thread 0x2aaab4b21700 (LWP 25483)]
[Thread 0x2aaab4b21700 (LWP 25483) exited]
[New Thread 0x2aaab4b21700 (LWP 25484)]
*********10*********20*******
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2aaab4b21700 (LWP 25484)]
itk::AdaptiveNonLocalMeansDenoisingImageFilter<itk::Image<float, 3u>, itk::Image<float, 3u>, itk::Image<unsigned char, 3u> >::ThreadedGenerateData (this=<optimized out>, region=..., threadId=<optimized out>)
    at .../Utilities/itkAdaptiveNonLocalMeansDenoisingImageFilter.hxx:297
297                                      this->m_MeanImage->GetPixel( centerNeighborhoodPatchIndex );

Thanks

ntustison commented 7 years ago

What's the exact command line call? Also, what is $ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS?

abiancardi commented 7 years ago

Command line

DenoiseImage -v 1 -d 3 -o test.mha -i CT-chest.nrrd

Tried with ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS equal to 2, 4, 6 and seg-faulted. Tried again with ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=1 and completed successfully!

ntustison commented 7 years ago

Okay, check it now. https://github.com/stnava/ANTs/commit/15163229396982118cb0d5a6ade08bfc46d03996

abiancardi commented 7 years ago

Fixed! Thanks

gdevenyi commented 6 years ago

Resolved.