clEsperanto / pyclesperanto_prototype

GPU-accelerated bio-image analysis focusing on 3D+t microscopy image data
http://clesperanto.net
BSD 3-Clause "New" or "Revised" License
208 stars 44 forks source link

Build program failure #329

Open somas193 opened 8 months ago

somas193 commented 8 months ago

I am using pyclesperanto 0.24.2 in a workflow that is run in a Singularity container on a Linux machine. The workflow involves using some image processing operations and the program runs into a problem at the first operation (top_hat_box) itself! The details are given below:

Traceback (most recent call last):
  File "/beegfs/ws/0/soku668b-data/segmentation/tribolium-clesperanto/tribolium-clesperanto_cluster_1.py", line 75, in <module>
    background_subtracted = cle.top_hat_box(image, radius_x=5, radius_y=5)
  File "/app/env/lib/python3.9/site-packages/pyclesperanto_prototype/_tier0/_plugin_function.py", line 71, in worker_function
    return function(*bound.args, **bound.kwargs)
  File "/app/env/lib/python3.9/site-packages/pyclesperanto_prototype/_tier2/_top_hat_box.py", line 43, in top_hat_box
    minimum_box(source, temp1, radius_x, radius_y, radius_z)
  File "/app/env/lib/python3.9/site-packages/pyclesperanto_prototype/_tier0/_plugin_function.py", line 71, in worker_function
    return function(*bound.args, **bound.kwargs)
  File "/app/env/lib/python3.9/site-packages/pyclesperanto_prototype/_tier1/_minimum_box.py", line 40, in minimum_box
    execute_separable_kernel(
  File "/app/env/lib/python3.9/site-packages/pyclesperanto_prototype/_tier1/_execute_separable_kernel.py", line 31, in execute_separable_kernel
    execute(anchor, opencl_kernel_filename, kernel_name, src.shape, parameters)
  File "/app/env/lib/python3.9/site-packages/pyclesperanto_prototype/_tier0/_execute.py", line 35, in execute
    return Backend.get_instance().get().execute(anchor, opencl_kernel_filename, kernel_name, global_size, parameters, prog, constants, image_size_independent_kernel_compilation, device)
  File "/app/env/lib/python3.9/site-packages/pyclesperanto_prototype/_tier0/_opencl_backend.py", line 41, in execute
    return execute(anchor, opencl_kernel_filename, kernel_name, global_size, parameters, prog, constants, image_size_independent_kernel_compilation, device)
  File "/app/env/lib/python3.9/site-packages/pyclesperanto_prototype/_tier0/_opencl_execute.py", line 311, in execute
    prog = device.program_from_source("\n".join(defines))
  File "/app/env/lib/python3.9/site-packages/pyclesperanto_prototype/_tier0/_device.py", line 28, in program_from_source
    return OCLProgram(src_str=source, dev=self)
  File "/app/env/lib/python3.9/site-packages/pyclesperanto_prototype/_tier0/_program.py", line 28, in __init__
    self.build(options=build_options)
  File "/app/env/lib/python3.9/site-packages/pyopencl/__init__.py", line 535, in build
    self._prg, was_cached = self._build_and_catch_errors(
  File "/app/env/lib/python3.9/site-packages/pyopencl/__init__.py", line 583, in _build_and_catch_errors
    raise err
pyopencl._cl.RuntimeError: clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE

Build on <pyopencl.Device 'NVIDIA A100-SXM4-40GB' on 'NVIDIA CUDA' at 0x562dd9782f20>:
haesleinhuepf commented 8 months ago

Hi @somas193 ,

on a Linux machine

I presume that's on the new cluster?

What's installed in your singularity container? I suspect driver issues here. Does the same code work in a Jupyter notebook on the cluster?

CC @thawn

somas193 commented 8 months ago

Hi @somas193 ,

on a Linux machine

I presume that's on the new cluster?

What's installed in your singularity container? I suspect driver issues here. Does the same code work in a Jupyter notebook on the cluster?

CC @thawn

@haesleinhuepf yes, it's on the new cluster. The singularity container is similar to the one used before and has CUDA 11.3, pyclesperanto_prototype 0.24.2, pyopencl 2023.1.4 and other software installed. The cluster has CUDA 12.3 but I was able to run tensorflow-based applications with GPU support in the container on the cluster. I haven't tested if my code works in a Jupyter notebook on the new cluster.

thawn commented 8 months ago

when you started the container, did you add the command line arguments -B /etc/OpenCL/ ?

somas193 commented 8 months ago

when you started the container, did you add the command line arguments -B /etc/OpenCL/ ?

Yes, I did.