Closed callym closed 1 year ago
Good question. Futhark used to work fine on Intel GPUs (it was the main development platform), but it's been a while and we don't have any Intel GPUs in our regular test systems. Try compiling the program like this:
$ futhark opencl fut/examples.fut
and then running it with
$ echo '[1,2,3,4] [2,3,4,1]' | fut/example -D
The -D
will cause copious debug output and perhaps reveal where the problem is (it's unlikely to be in the clFinish()
).
One thing I remember is that Intel GPUs did not support double precision (and judging by that clinfo output, still don't). We have code to handle that, but it might have bitrotted. Still, that should cause a different error.
Okay so with this I get absolutely no output and the program seems to hang until I Ctrl+C
- using any other backend apart from OpenCL gives me output like:
❯ echo '[1,2,3,4] [2,3,4,1]' | fut/example -D
Allocating 16 bytes for arr->mem in default space (then allocated: 16 bytes).
Allocating 16 bytes for arr->mem in default space (then allocated: 32 bytes) (new peak).
Unreferencing block arr->mem (allocated as arr->mem) in default space: 0 references remaining.
16 bytes freed (now allocated: 16 bytes)
Unreferencing block arr->mem (allocated as arr->mem) in default space: 0 references remaining.
16 bytes freed (now allocated: 0 bytes)
24i32
Could the initial failure with futhark test
come from the test timing out?
I also noticed I was using 0.22.7
but trying with 0.24.1
gives the same problem
That is very mysterious. Does it hang while using CPU time, or is it just stuck? I don't understand how that can possibly happen, since even initialising OpenCL should cause lots of debug output.
Does OpenCL work for other programs on this system?
I think the hanging is a bug with Intel's OpenCL loader, followed the steps here: https://www.reddit.com/r/archlinux/comments/124pgc1/how_to_disable_oneapi_opencl_cpu_backend/
And now I get the following output:
❯ echo '[1,2,3,4] [2,3,4,1]' | ./example -D
Using platform: Intel(R) OpenCL HD Graphics
Using device: Intel(R) Graphics [0x46a6]
Lockstep width: 1
Default group size: 256
Default number of groups: 384
OpenCL compiler options: -DLOCKSTEP_WIDTH=1 -Dmax_group_size=512 -Dbuiltinzhreplicate_i32zigroup_sizze_6272=256 -Dmainzisegred_group_sizze_6245=256 -Dmainzisegred_num_groups_6247=384
Creating OpenCL program...
Building OpenCL program...
Created kernel builtin#replicate_i32.replicate_6268.
Created kernel main.segred_nonseg_6253.
Allocating 40 bytes for counters_mem_6261 in space 'device' (then allocated: 40 bytes).
Actually allocating the desired block.
Launching builtin#replicate_i32.replicate_6268 with global work size [256] and local work size [256]; local memory: 0 bytes.
kernel builtin#replicate_i32.replicate_6268 runtime: 1898us
Allocating 16 bytes for arr->mem in space 'device' (then allocated: 56 bytes) (new peak).
Actually allocating the desired block.
Allocating 16 bytes for arr->mem in space 'device' (then allocated: 72 bytes) (new peak).
Actually allocating the desired block.
Allocating 4 bytes for mem_6258 in space 'device' (then allocated: 76 bytes) (new peak).
Actually allocating the desired block.
Allocating 4 bytes for segred_tmp_mem_6283 in space 'device' (then allocated: 80 bytes) (new peak).
Actually allocating the desired block.
# SegRed
Launching main.segred_nonseg_6253 with global work size [256] and local work size [256]; local memory: 1025 bytes.
./example: example.c:8078: OpenCL call
clFinish(ctx->queue)
failed with error code -5 (Out of resources)
I also get an invalid pointer
exception when trying to run LuxMark on OpenCL, so I think this is an Intel issue instead of a Futhark issue. (unfortunate for me, fortunate for you I guess!)
Condolences. Sounds like this isn't our fault.
Hi, I'm trying to get OpenCL working on an Intel 12th Gen processor, and I get the following output: Is there any way of finding out what sort of resource it's running out of? I've got plenty of RAM so it should be fine (and the code I'm trying out shouldn't use a lot of memory anyway).
I've tried Blender Cycles with OpenCL and it seemed to work, but didn't run any extensive tests.
Running with
--backend {c,ispc,multicore}
works fineThe
fut/example.fut
code is:The output for
clinfo
is below:Details