Open zezisme opened 23 hours ago
Interesting. Is that ghost that you get the first slice? This was coded 10 years ago, so I don't exactly remember the implementation, but it could be caused to periodic boundary conditions.
In any case, you can find the math and the relevant paper in my PhD thesis (in my github profile) and the code is in Common/CUDA/tvdenoising.cu
. It does not seem like it has periodic boundary conditions. Can you identify which slice this ghost is coming from? Can yo also tell me the size of the image and number of GPUs (including models)? Can you reproduce this in a small image, on a single GPU using TIGREs default data?
In any case, you can find the math and the relevant paper in my PhD thesis (in my github profile) and the code is in
Common/CUDA/tvdenoising.cu
. It does not seem like it has periodic boundary conditions. Can you identify which slice this ghost is coming from? Can yo also tell me the size of the image and number of GPUs (including models)? Can you reproduce this in a small image, on a single GPU using TIGREs default data?
The CASE1: INPUT IMAGES SIZE = (1529,1529,400),And I just have one GPU The input image(slice 400,last slice,No ghost ) Using the default data(iter=50; hyper=15;single GPU),the output denoised image is(there is a obvious ghost ): The 400th slice image(there is a obvious ghost ) The 399th slcie image The 398th slcie image The ghost are becoming weaker And the ghost is seems come from the 267th slice image
CASE2:INPUT IMAGES SIZE = (1529,1529,100),Using the default data(iter=50; hyper=15;single GPU) the ouput image(last image) has no ghost(great!!!)
CASE3:INPUT IMAGES SIZE = (1529,1529,200),Using the default data(iter=50; hyper=15;single GPU) the ouput image(last image) has ghost again And the ghost is seems come from the 100th slice image
Final Test: I find that if the GPU memory is not enough, than there is a warn, the algorithm will using the CPU memory , And the ghost is coming! if the GPU memory is enough, the ghost will not occur.
Conclution: This bug is most likely caused by a problem with data acquisition during the communication between the CPU and GPU.
Fnatstic experiment, indeed this is why I asked GPU size etc. Seems that something is broken in the memory-in-out where either the memory doesn't get appropriately reset, or the chunk of memory being copied is wrong. I will investigate.
Fnatstic experiment, indeed this is why I asked GPU size etc. Seems that something is broken in the memory-in-out where either the memory doesn't get appropriately reset, or the chunk of memory being copied is wrong. I will investigate.
Great! Looking forward to your solution
A posibility:
https://github.com/CERN/TIGRE/blob/master/Common/CUDA/tvdenoising.cu#L378-L389
may need to be changed to:
for (dev = 0; dev < deviceCount; dev++){
cudaSetDevice(gpuids[dev]);
// New line:
cudaMemsetAsync(d_src[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);
cudaMemcpyAsync(d_src[dev]+offset_device[dev], src+offset_host[dev] , bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]);
}
for (dev = 0; dev < deviceCount; dev++){
cudaSetDevice(gpuids[dev]);
// All these are async
// New line:
cudaMemsetAsync(d_u[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]);
cudaMemcpyAsync(d_u[dev] +offset_device[dev], d_src[dev]+offset_device[dev], bytes_device[dev]*sizeof(float), cudaMemcpyDeviceToDevice,stream[dev*nStream_device+1]);
cudaMemsetAsync(d_px[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
cudaMemsetAsync(d_py[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
cudaMemsetAsync(d_pz[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]);
}
I can not try now, but if you want to try that, great! If not, give me a bit of time and i'll give it a shot myself.
A posibility:
https://github.com/CERN/TIGRE/blob/master/Common/CUDA/tvdenoising.cu#L378-L389
may need to be changed to:
for (dev = 0; dev < deviceCount; dev++){ cudaSetDevice(gpuids[dev]); // New line: cudaMemsetAsync(d_src[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]); cudaMemcpyAsync(d_src[dev]+offset_device[dev], src+offset_host[dev] , bytes_device[dev]*sizeof(float), cudaMemcpyHostToDevice,stream[dev*nStream_device+1]); } for (dev = 0; dev < deviceCount; dev++){ cudaSetDevice(gpuids[dev]); // All these are async // New line: cudaMemsetAsync(d_u[dev], 0, mem_img_each_GPU,stream[dev*nStream_device+1]); cudaMemcpyAsync(d_u[dev] +offset_device[dev], d_src[dev]+offset_device[dev], bytes_device[dev]*sizeof(float), cudaMemcpyDeviceToDevice,stream[dev*nStream_device+1]); cudaMemsetAsync(d_px[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]); cudaMemsetAsync(d_py[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]); cudaMemsetAsync(d_pz[dev], 0, mem_img_each_GPU,stream[dev*nStream_device]); }
I can not try now, but if you want to try that, great! If not, give me a bit of time and i'll give it a shot myself.
I have tested your new code, but the results are still the same as before, the same ghost is still there
Hello, I found that when using im3DDenoise to denoise 3D images, as the number of iterations increases, edge slices will produce image artifacts, and these artifacts seem to come from other slices. Is there any reference document for this algorithm? Can you provide the corresponding mathematical derivation document ? I want know is the issue come from the mathematical principle or its implementation method?
Actual Behavior
TV_lambda = 200;
FDK FDK+TVdenoise(50 iterations) FDK+TVdenoise(100 iterations) FDK+TVdenoise(200 iterations)
The artifact seems come from the 267th slice image(Or near this slice, And the total slice num is 400) It seems to be related to a ratio of 1/3((400-267)/400≈1/3)
Code to reproduce the problem (If applicable)
Specifications