HenriquesLab / NanoJ-eSRRF

Apache License 2.0
19 stars 2 forks source link

parameter sweep crashes, GPU memory fills up #4

Open pawlowska opened 2 years ago

pawlowska commented 2 years ago

Hi, I tried running eSRRF on Quadro K2200 GPU. It works in principle (YAY!) but crashes sometimes. For example yesterday it crashed at step 22/50 of the parameters sweep, meaning it did 21 steps just fine. I am including the error below. I also noticed that the GPU memory was showing as nearly full, 3.9GB out of 4GB - can it be that the memory somehow does not get released.....?

(Fiji Is Just) ImageJ 2.3.0/1.53q; Java 1.8.0_172 [64-bit]; Windows 10 10.0; 1763MB of 73552MB (2%) com.jogamp.opencl.CLException$CLMemObjectAllocationFailureException: can not enqueue 1DRange CLKernel [id: 202273553280 name: kernelResetFramePosition] with gwo: null gws: {1} lws: null cond.: null events: null [error: CL_MEM_OBJECT_ALLOCATION_FAILURE] at com.jogamp.opencl.CLException.newException(CLException.java:84) at com.jogamp.opencl.CLCommandQueue.putNDRangeKernel(CLCommandQueue.java:1638) at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1496) at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1466) at nanoj.liveSRRF.LiveSRRF_CL.resetFramePosition(LiveSRRF_CL.java:958) at nanoj.liveSRRF.gui.ParametersSweep_.run(ParametersSweep_.java:430) at ij.IJ.runUserPlugIn(IJ.java:243) at ij.IJ.runPlugIn(IJ.java:204) at ij.Executer.runCommand(Executer.java:152) at ij.Executer.run(Executer.java:70) at java.lang.Thread.run(Thread.java:748)

micahschott commented 2 years ago

I also ran into problems today. I'm not really sure how to troubleshoot this.

image

HannahSHeil commented 2 years ago

Thanks for finding this problem! I'll look into it!

ammendes commented 2 years ago

I also ran into problems today. I'm not really sure how to troubleshoot this.

image

I think your problem is different from the OP's. Seems like you have a problem with loading OpenCL, either due to a bad installation or missing Java bindings (i.e., JOCL).

HannahSHeil commented 2 years ago

Hi @pawlowska, sorry for taking so long to reply to your request. The memory overflow on NVidia graphics cards has been described before here: https://clij.github.io/clij2-docs/faq. ->> Big Images on NVidia graphics cards

image

bheit commented 1 year ago

Just to add some more information to this issue. We've run parameter sweep on the exact same acquisition image across 5 computers with odd results.

I'm not sure if that will help track down the issue or not, but this is an extensive a test as we can perform.

HannahSHeil commented 1 year ago

Thank you for sharing this info, @bheit!

HannahSHeil commented 1 year ago

Hi Could you please try the following fix?

Go to the folder /.../Fiji.app/jars Remove these: jogl-all-2.4.0-rc-20211011.jar jocl-all-2.4.0-rc-20211011.jar joal-all-2.4.0-rc-20211011.jar gluegen-rt-2.4.0-rc-20210111.jar

Go to: https://jogamp.org/deployment/archive/rc/v2.5.0-rc-20230523/jar/ Download the following files: jogl-all.jar, jocl.jar, joal.jar, gluegen.jar Place these four files in the folder /.../Fiji.app/jars

Go to the folder /.../Fiji.app/jars/win64 Remove these: gluegen-rt-2.4.0-rc-20210111-natives-windows-amd64.jar joal-2.4.0-rc-20210111-natives-windows-amd64.jar jocl-2.4.0-rc-20210111-natives-windows-amd64.jar

Go to: https://jogamp.org/deployment/archive/rc/v2.5.0-rc-20230523/jar/ Download the following files: gluegen-rt-natives-windows-amd64.jar, jocl-natives-windows-amd64.jar, joal-natives-windows-amd64.jar Place these three files in the folder /.../Fiji.app/jars/win64

Restart Fiji

HannahSHeil commented 1 year ago

If you are working on Windows 10/11 and you would like to use other NanoJ Plugins aswell you might also need to install the OpenCL™ and OpenGL® Compatibility Pack (https://apps.microsoft.com/store/detail/opencl%E2%84%A2-and-opengl%C2%AE-compatibility-pack/9NQPSL29BFFF?hl=en-us&gl=us&activetab=pivot%3Aoverviewtab)

In that case please make sure to select the following settings: Advanced settings > Processing device > Default device image

bheit commented 1 month ago

I have done everything recommended above (installed the windows OpenGL/CL pack, replaced the jar files, and edited the registry) and I am still having the parameter sweep fail. It also fails at the exact same place (frame 58/60), regardless of the size of the base file use.

The error I get is:

(Fiji Is Just) ImageJ 2.14.0/1.54f; Java 1.8.0_322 [64-bit]; Windows 10 10.0; 1122MB of 24347MB (4%)

com.jogamp.opencl.CLException$CLDeviceNotAvailableException: can not create CL context [error: CL_DEVICE_NOT_AVAILABLE] at com.jogamp.opencl.CLException.checkForError(CLException.java:67) at com.jogamp.opencl.CLContext.createContextFromType(CLContext.java:204) at com.jogamp.opencl.CLContext.create(CLContext.java:172) at com.jogamp.opencl.CLContext.create(CLContext.java:155) at nanoj.liveSRRF.LiveSRRF_CL.initialise(LiveSRRFCL.java:179) at nanoj.liveSRRF.gui.ParametersSweep.run(ParametersSweep_.java:452) at ij.IJ.runUserPlugIn(IJ.java:244) at ij.IJ.runPlugIn(IJ.java:210) at ij.Executer.runCommand(Executer.java:152) at ij.Executer.run(Executer.java:70) at java.lang.Thread.run(Thread.java:750)

bheit commented 1 month ago

Fixed it - for some reason parameter sweep was using the windows renderer instead of the NVIDIA 3060 for processing. I switched the device from 'default' to 'NVIDIA' under the "Advanced" options and it ran correctly (and 5x as fast!)