clEsperanto / pyclesperanto_prototype

GPU-accelerated bio-image analysis focusing on 3D+t microscopy image data
http://clesperanto.net
BSD 3-Clause "New" or "Revised" License
208 stars 44 forks source link

median_sphere filter fails on large images #316

Open sebherbert opened 1 year ago

sebherbert commented 1 year ago

Hi all,

@spherelife and I are having issues running the pyclesperanto_prototype.median_sphere filter. the same kernel size (6,6,2) works on a small image but not on a larger image. Error: RuntimeError: clEnqueueReadBuffer failed: OUT_OF_RESOURCES

We can run a smaller kernel on the large image (2,2,2) for example, but not the 6,6,2 kernel. Surprisingly, after failing in the large image, it also stops running on the small image either afterwards (same error) until we restart the kernel.

I was not expecting that the image size would play a large role in the processing but maybe I'm wrong and misunderstood something?

We are using a VM with an W10 machine and a shared NVIDIA RTXA6000-12Q.

I attach

Let me know if something else could be of use for you.

Thanks!

haesleinhuepf commented 1 year ago

Hi @sebherbert ,

how large is the large image?

Best, Robert

spherelife commented 1 year ago

Hi @haesleinhuepf

The small image we tested is 378x363x77 in xyz and ~82 MB. The large image is 1536x1536x134 and ~2.4 GB.

Best, Fei

sebherbert commented 1 year ago

Hi Robert,

Thanks for the fast follow-up!

As @spherelife was saying the "large" image 1536x1536x134 (16bits if I recall correctly) so nothing completely crazy :) I guess we could try with intermediate image size if it makes sense?

Best, Sebastien

haesleinhuepf commented 1 year ago

Hi @sebherbert and @spherelife ,

if you work on a Windows machine, can you try the solution proposed here and extend the kernel timeout in the registry?

Let me know if this helps!

Best, Robert

sebherbert commented 12 months ago

Hi @haesleinhuepf,

Thanks for the reply and sorry it took us a while to come back to you, In the meantime @spherelife have tested the same image on a more powerful workstation (A100 card, Linux based). He reported that it ran smoothly (and he even tested with a 6x6x6 kernel that also passed without complaining despite being larger than 1000 voxels) so we'll keep this solution for the moment. So I guess this was either an allocation speed issue or limited vRAM issue since the larger card is working?

Thanks again for the support,

Best, Sebastien

TimMonko commented 1 week ago

I just wanted to comment in cased anyone else comes across this issue and is looking for assistance besides 'better GPU'. In fact, I came across this issue because processing some images was working on an RTX 3060 12GB, but not on a Quadro RTX6000 (24GB) workstation GPU. The workstation GPU was having an CL_INVALID_COMMAND_QUEUE error on slightly larger images, but was handling images 1/4 the size ok. Either way, the images are at least 20 times smaller than VRAM and all worked on the 3060.

Anyways, I added to the registry (previously no key existed).

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers]
"TdrDelay"=dword:0000003c
"TdrDdiDelay"=dword:0000003c

in Powershell with

New-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\GraphicsDrivers" -Name TdrDelay -PropertyType DWord -Value 60 -Force
New-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\GraphicsDrivers" -Name TdrDdiDelay -PropertyType DWord -Value 60 -Force

and am now getting no error on the workstation GPU. I have had other workflows in the past with much larger images that also have CL_MEM_OBJECT_ALLOCATION_FAILURE and will report back if it also helps.