dbeurle / neon

A finite element code
Other
10 stars 8 forks source link

Isolate OpenCL and complete device selection support #113

Closed dbeurle closed 5 years ago

dbeurle commented 5 years ago

OpenCL is difficult to get working through docker and the CI. This will be unfortunately removed from the CI until I can figure out how it works.

Tasks:

Sample input snippets

Native CPU implementation:

"linear_solver" : {
    "type" : "iterative"
}

OpenCL with CPU backend:

"linear_solver" : {
    "type" : "iterative",
    "device" : "cpu",
    "backend" : {
        "type" : "opencl",
        "platform" : 0,
        "device" : 0
    }
}

OpenCL with GPU backend:

"linear_solver" : {
    "type" : "iterative",
    "device" : "gpu",
    "backend" : {
        "type" : "opencl",
        "platform" : 0,
        "device" : 0
    }
}

CUDA backend:

"linear_solver" : {
    "type" : "iterative",
    "device" : "gpu",
    "backend" : {
        "type" : "cuda",
        "device" : 0
    }
}

@shadialameddin What do you think of this? It avoids a lot of automation since in this case, the user knows best about which device to choose and we can just do the error checking on class instantiation.

shadisharba commented 5 years ago

Yeah, I like the idea. Giving the user control over the devices is nice. If the device fails or sth goes wrong, the class will throw an error, right?

dbeurle commented 5 years ago

Right, we'll double check on the inputs and maybe list the available devices if it's wrong. Great, I'll implement this and write the documentation. I'll ping back once it's in development and if you could try it out that would be cool.

dbeurle commented 5 years ago

@shadialameddin Could you please test the OpenCL routines on the feature branch? I get a segfault on my machines for some reason.

shadisharba commented 5 years ago

It works with cpu but I get the same error when switching to gpu. I also get X server found. dri2 connection failed!. I was able to get rid of by X server found. dri2 connection failed! by sudo dnf remove beignet but that didn't solve the segfault issue.

Thread 1 "neonfe" received signal SIGSEGV, Segmentation fault. 0x00007ffff7d79c09 in clRetainMemObject () from /lib64/libOpenCL.so.1