Closed marcnol closed 5 months ago
Hello!
The error messages are a bit cryptic -- clearly a point for improvements.
However, it looks like there was not enough memory on the gpu. Try with a smaller image to know for sure. Typically dw with --gpu require as much gpu memory as it needs RAM when --gpu is not specified. You could check the log files for that number, it will be found at the end.
The image can also be processed in tiles, try for example --tilesize 1024
, hopefully that will work. It is possible to compromise on the boundary handling with --bq 1 or even --bq 0, for some use case that is ok, but it depends on later image analysis steps in the pipeline.
The --gpu option has been tested only with a few cards for Nvidia and AMD so far so there might still be bugs to discover. There is a later version of vkFFT and I plan to upgrade to that one soon. That could potentially resolve some issues.
Thanks for the prompt answer !
I am surprised that it is a memory issue as the image is 590Mb and the GPU has 11Gb (...we never had troubles deconvolving these images with huygens, but the algorithm may be different !)
I tried to use --tilesize with 1024, 512, and now even 128... but in all cases it gets stuck (see Traceback below).
How much memory did the GPU cards you tested had? Ours is an NVIDIA GeForce RTX 2080 Ti.
Thanks again Erik !
Log for --tilesize 128:
-> Processing tile 29 / 256
Deconvolving using shbcl2 (using inplace)
image: [168x168x65], psf: [181x181x129], job: [348x348x193]
Iteration 50/ 50, Idiv=7.338e-01
-> Processing tile 30 / 256
Deconvolving using shbcl2 (using inplace)
image: [168x168x65], psf: [181x181x129], job: [348x348x193]
Iteration 50/ 50, Idiv=9.189e-01
-> Processing tile 31 / 256
Deconvolving using shbcl2 (using inplace)
image: [168x168x65], psf: [181x181x129], job: [348x348x193]
Iteration 50/ 50, Idiv=9.570e-01
-> Processing tile 32 / 256
Deconvolving using shbcl2 (using inplace)
image: [168x148x65], psf: [181x181x129], job: [348x328x193]
Iteration 50/ 50, Idiv=7.720e-01
-> Processing tile 33 / 256
Deconvolving using shbcl2 (using inplace)
image: [168x148x65], psf: [181x181x129], job: [348x328x193]
.On no... bad new from OpenCL:
errinfo: CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_COPY_BUFFER on NVIDIA GeForce RTX 2080 Ti (Device 0).
Sorry! There was an unrecoverable error!
File: /home/marcnol/Repositories/deconwolf/src/cl_util.c
Function: fimcl_copy at line 301
OpenCl error=CL_MEM_OBJECT_ALLOCATION_FAILURE
CL_MEM_OBJECT_ALLOCATION_FAILURE indicates that there was not
enough memory on the GPU to continue. Try with a smaller image
and look up the option --tilesize
If you are sure that OpenCL works on this machine
and that it is a problem only related to deconwolf,
check open issues or create a new one at
https://github.com/elgw/deconwolf/issues
On no... bad new from OpenCL:
errinfo: CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_COPY_BUFFER on NVIDIA GeForce RTX 2080 Ti (Device 0).
I use a 12 GB card at home (AMD 6700) and have never see this issue... in the office we do all deconvolution on a 24 GB Nvidia 3080...
Tile 32 and 33 have the same size and should require as much memory so this looks strange to me and indicates that I have a bug to hunt down (memory not released properly).
My guess is that --tilesize 512 should do just fine with your GPU, so there is no reason to go below that (I bet that the first tile is processed without problems).
I will investigate this next week and get back to you. Sorry for the poor first experience!
Cheers, Erik
Hey,
I think I found the problem, at -tilesize 512 it is able to do 10-11 iterations of the 16 it needs to do.
When I monitor GPU usage you see that the deconvolution of each sub-image uses only a small amount of memory, but that from cycle to cycle the memory is not cleared out... so this accumulates over time and ends up occupying all the available memory...
See three snapshots of GPU usage at different time points below. Also the log of the execution is attached.
cheers
marcelo
Thanks, that is useful to know.
Hello again,
Here is an update from my side:
What was wrong?
Preventing it in the future:
Thank you so much for spotting the problem and caring to report!
It works with my AMD card but memory is still not released properly with an Nvidia card. Still investigating.
The memory release was blocked on Nvidia due to a missing clReleaseEvent
. With that in place it looks good on both AMD and Nvidia.
Screenshots of nvtop while deconvolving 2048 x 2048 x P images using tiles of size 1024:
I just tried version 0.4.1 in two NVIDIA GPUs and on two linux distros (ubuntu-like+arch) and the bug does not appear anymore. Given the memory of my GPU (11Gb) and of my images (590Mb) I had to use a tilesize of 1024. There is no more memory leakage and dw does properly deconvolve these images.
Thanks Erik for your prompt response!
Hey Erik, congratulations on the paper coming out ! great news ;)
I have installed the latest version (0.4) in our systems (ARCH/ubuntu) with some twickling the compilation with CL worked out fine. See output of
dw --version
Runs without the
--gpu
option work great.However, when I try to run files with
--gpu
I get the following error:Do you know if this installation related??
thanks again and great work!
BTW, here is the full log file for this run: