opencl implementation - Githubissues

penguin42 commented 1 year ago

This set implements the correlate and min steps in OpenCL; it's enabled by passing the -O option to deconvolve. It's actually currently running slower than my CPU for me; but I'm on a high end CPU (Ryzen 3950X) and low end (Radeon RX550 ) GPU; so I'd be interested to see what other people get. I'm seeing about 260 LPS on the GPU and 350 on the CPU; although the machine feels like a swamp and runs hot as hell when run on the CPU and is quite usable when using the GPU.

On Fedora 37 I'm seeing a compiler warning during startup:

warning: argument unused during compilation: '-I /usr/lib64/python3.11/site-packages/pyopencl/cl' [-Wunused-command-line-argument]

that looks like a distro bug - that's just because I'm not using any includes as far as I can tell.

penguin42 commented 1 year ago

Interesting, replacing ROCM (AMD's OpenCL setup) with Mesa's wen tfrom 250 to 420LPS - much better

itewreed commented 1 year ago

I cloned the repository from https://github.com/penguin42/vhs-teletext/tree/dgopencl and switched to the dgopencl branch. Then I recompiled using pip3 install -e .[CUDA,spellcheck,viewer] I get the following error when trying with OpenCL Option

clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE

Build on <pyopencl.Device 'NVIDIA GeForce GTX 1660' on 'NVIDIA CUDA' at 0x55aa80c86a80>:

<kernel>:18:19: error: overloaded function 'min' must have the 'overloadable' attribute
    __kernel void min(global float *input, global int *indexes, int npatterns)
                  ^
cl_kernel.h:3830:26: note: previous overload of function is here
ulong16 __OVERLOADABLE__ min(ulong16 x, ulong y); 
                         ^

(options: -I /home/itewreed/anaconda3/lib/python3.8/site-packages/pyopencl/cl)
(source saved as /tmp/tmpbkuwevpe.cl)
OpenCL init failed. Using slow CPU method instead.

penguin42 commented 1 year ago

I cloned the repository from https://github.com/penguin42/vhs-teletext/tree/dgopencl and switched to the dgopencl branch. Then I recompiled using pip3 install -e .[CUDA,spellcheck,viewer] I get the following error when trying with OpenCL Option

Thanks for testing it.

clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE

Build on <pyopencl.Device 'NVIDIA GeForce GTX 1660' on 'NVIDIA CUDA' at 0x55aa80c86a80>:

<kernel>:18:19: error: overloaded function 'min' must have the 'overloadable' attribute
    __kernel void min(global float *input, global int *indexes, int npatterns)
                  ^
cl_kernel.h:3830:26: note: previous overload of function is here
ulong16 __OVERLOADABLE__ min(ulong16 x, ulong y); 
                         ^

(options: -I /home/itewreed/anaconda3/lib/python3.8/site-packages/pyopencl/cl)
(source saved as /tmp/tmpbkuwevpe.cl)
OpenCL init failed. Using slow CPU method instead.

OK, that looks like a simple name clash to me; can you try renaming my function (and where it's looked up in the init function) to say minerr and see if it's happier?

itewreed commented 1 year ago

That helped. In patternopencl.py I changed __kernel void min(global float *input, global int *indexes, int npatterns) into __kernel void minerr(global float *input, global int *indexes, int npatterns) and in the init function self.kernel_min = self.prg.min into self.kernel_min = self.prg.minerr After that it runs and decodes via OpenCL My GTX 1660 gives 450L/s. System is Ubuntu 22.04

penguin42 commented 1 year ago

Thanks @itewreed - just updated it with the name change

ali1234 commented 1 year ago

Merged.

I have changed the way the CLI option works a bit. OpenCL is now always "tried" unless -C/--force-cpu is specified and -O/--prefer-opencl means to try OpenCL before CUDA. CUDA is still the default if both are available as it still seems a bit faster.

penguin42 commented 1 year ago

Thanks! Feel free to tag me in any opencl bugs that come up. With the changed option flag, I'm noticing when I run it, with no flags, there's a line of 'CUDA' printed:

PYOPENCL_COMPILER_OUTPUT=1 teletext deconvolve -O dgdata/capture-20230201-e.vbi > z

No module named 'pycuda' No module named 'pycuda' No module named 'pycuda' No module named 'pycuda' 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 32448/32448 [00:37<00:00, 875.06L/s, R:31%, M:|▂ ▄█▅▁|] CUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDACUDA[dg@dalek vhs-teletext]$

ali1234 / vhs-teletext

opencl implementation #75