CERN / TIGRE

TIGRE: Tomographic Iterative GPU-based Reconstruction Toolbox
BSD 3-Clause "New" or "Revised" License
569 stars 187 forks source link

Artifacts in 3D-TV Denoised Images #594

Open zezisme opened 23 hours ago

zezisme commented 23 hours ago

Hello, I found that when using im3DDenoise to denoise 3D images, as the number of iterations increases, edge slices will produce image artifacts, and these artifacts seem to come from other slices. Is there any reference document for this algorithm? Can you provide the corresponding mathematical derivation document ? I want know is the issue come from the mathematical principle or its implementation method?

Actual Behavior

TV_lambda = 200;

FDK image FDK+TVdenoise(50 iterations) image FDK+TVdenoise(100 iterations) image FDK+TVdenoise(200 iterations) image

The artifact seems come from the 267th slice image(Or near this slice, And the total slice num is 400) It seems to be related to a ratio of 1/3((400-267)/400≈1/3) image

Code to reproduce the problem (If applicable)

x=FDK(rawdata_proj,geo,angles,'filter',filter_type);
% TV denoising
TV_niter = 200; % 50,100,200,..
TV_lambda = 200; % 10,20,.....200
x=im3DDenoise(x,'TV',TV_niter,TV_lambda,'gpuids',GpuIds());

Specifications

AnderBiguri commented 19 hours ago

Interesting. Is that ghost that you get the first slice? This was coded 10 years ago, so I don't exactly remember the implementation, but it could be caused to periodic boundary conditions.

AnderBiguri commented 19 hours ago

In any case, you can find the math and the relevant paper in my PhD thesis (in my github profile) and the code is in Common/CUDA/tvdenoising.cu. It does not seem like it has periodic boundary conditions. Can you identify which slice this ghost is coming from? Can yo also tell me the size of the image and number of GPUs (including models)? Can you reproduce this in a small image, on a single GPU using TIGREs default data?

zezisme commented 17 hours ago

In any case, you can find the math and the relevant paper in my PhD thesis (in my github profile) and the code is in Common/CUDA/tvdenoising.cu. It does not seem like it has periodic boundary conditions. Can you identify which slice this ghost is coming from? Can yo also tell me the size of the image and number of GPUs (including models)? Can you reproduce this in a small image, on a single GPU using TIGREs default data?

The CASE1: INPUT IMAGES SIZE = (1529,1529,400),And I just have one GPU The input image(slice 400,last slice,No ghost ) image Using the default data(iter=50; hyper=15;single GPU),the output denoised image is(there is a obvious ghost ): The 400th slice image(there is a obvious ghost ) image The 399th slcie image image The 398th slcie image image The ghost are becoming weaker And the ghost is seems come from the 267th slice image image

CASE2:INPUT IMAGES SIZE = (1529,1529,100),Using the default data(iter=50; hyper=15;single GPU) the ouput image(last image) has no ghost(great!!!) image

CASE3:INPUT IMAGES SIZE = (1529,1529,200),Using the default data(iter=50; hyper=15;single GPU) the ouput image(last image) has ghost again image And the ghost is seems come from the 100th slice image image

Final Test: I find that if the GPU memory is not enough, than there is a warn, the algorithm will using the CPU memory , And the ghost is coming! if the GPU memory is enough, the ghost will not occur. image image

Conclution: This bug is most likely caused by a problem with data acquisition during the communication between the CPU and GPU.

AnderBiguri commented 17 hours ago

Fnatstic experiment, indeed this is why I asked GPU size etc. Seems that something is broken in the memory-in-out where either the memory doesn't get appropriately reset, or the chunk of memory being copied is wrong. I will investigate.

zezisme commented 17 hours ago

Fnatstic experiment, indeed this is why I asked GPU size etc. Seems that something is broken in the memory-in-out where either the memory doesn't get appropriately reset, or the chunk of memory being copied is wrong. I will investigate.

Great! Looking forward to your solution

AnderBiguri commented 17 hours ago

A posibility:

https://github.com/CERN/TIGRE/blob/master/Common/CUDA/tvdenoising.cu#L378-L389

may need to be changed to:

for (dev = 0; dev < deviceCount; dev++){
                        cudaSetDevice(gpuids[dev]);
                        // New line:
                        cudaMemsetAsync(d_src[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);
                        cudaMemcpyAsync(d_src[dev]+offset_device[dev], src+offset_host[dev]  , bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                    }
                    for (dev = 0; dev < deviceCount; dev++){
                        cudaSetDevice(gpuids[dev]);
                        // All these are async
                        // New line:
                        cudaMemsetAsync(d_u[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);
                        cudaMemcpyAsync(d_u[dev]  +offset_device[dev], d_src[dev]+offset_device[dev], bytes_device[dev]*sizeof(float), cudaMemcpyDeviceToDevice,stream[dev*nStream_device+1]);
                        cudaMemsetAsync(d_px[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
                        cudaMemsetAsync(d_py[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
                        cudaMemsetAsync(d_pz[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
                    }

I can not try now, but if you want to try that, great! If not, give me a bit of time and i'll give it a shot myself.

zezisme commented 1 hour ago

A posibility:

https://github.com/CERN/TIGRE/blob/master/Common/CUDA/tvdenoising.cu#L378-L389

may need to be changed to:

for (dev = 0; dev < deviceCount; dev++){
                        cudaSetDevice(gpuids[dev]);
                        // New line:
                        cudaMemsetAsync(d_src[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);
                        cudaMemcpyAsync(d_src[dev]+offset_device[dev], src+offset_host[dev]  , bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
                    }
                    for (dev = 0; dev < deviceCount; dev++){
                        cudaSetDevice(gpuids[dev]);
                        // All these are async
                        // New line:
                        cudaMemsetAsync(d_u[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);
                        cudaMemcpyAsync(d_u[dev]  +offset_device[dev], d_src[dev]+offset_device[dev], bytes_device[dev]*sizeof(float), cudaMemcpyDeviceToDevice,stream[dev*nStream_device+1]);
                        cudaMemsetAsync(d_px[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
                        cudaMemsetAsync(d_py[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
                        cudaMemsetAsync(d_pz[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
                    }

I can not try now, but if you want to try that, great! If not, give me a bit of time and i'll give it a shot myself.

I have tested your new code, but the results are still the same as before, the same ghost is still there