clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
839 stars 240 forks source link

Autogemm kernels should be per-context #165

Closed hughperkins closed 8 years ago

hughperkins commented 8 years ago

Autogemm kernels are global, rather than per-context.

You can see this eg in build/include/AutoGemmIncludes/AutoGemmClKernels.cpp

I reckon this means that if you create two contexts, eg for two different GPUs, and run GEMM on each, then the program will seg-fault.

Of course, you can work around this by using eg mpi: one process per GPU, but that has its own issues, eg IPC overhead. Given that it shouldnt be hard to make the global variables per-context, I reckon making the kernels and so on per-context is the correct solution to this.

guacamoleo commented 8 years ago

You are correct. The current architecture of offline-compiling kernels allows kernels to be compiled for only one type of GPU and it was easiest to have the architecture of online compilation follow that of offline compilation. We're currently discussing if and how we could support offline compilation for multiple devices (and online compilation architecture will follow suite) for later releases. -David

hughperkins commented 8 years ago

For online compilation, I've been using lua. The cool thing about lua is it's very lightweight. You can also use Jinja2-like syntax in lua templates too if you want (I know you're not using Jinja2 yet with python either, but anyway the option is there for Lua too). I'm quite happy with using Lua for online compilation, pretty fast, easy to use.

hughperkins commented 8 years ago

I'm not sure this issue should be closed by the way: I think there should be some way of running clblas on multiple devices, even if it means disabling offline compilation for example, or enabling it only for one device.

I'm not terribly keen on optimizations that are both hardware-specific and compile-time by the way. The fact that eg OpenBlas doesnt need these makes me rather partial to it compared to Atlas for example.

hughperkins commented 8 years ago

Not sure why this issue was closed, but fixed by Pavan in https://github.com/clMathLibraries/clBLAS/issues/197

pavanky commented 8 years ago

@hughperkins I have not tested this with offline compiled kernels but tried to be as careful as I can with it.

hughperkins commented 8 years ago

I dont use offline compiled kernels, but this issue is for offline-generated, and online-compiled. just to confirm, your fix fixes the issue for offline-generated, online-compiled, is that right?

pavanky commented 8 years ago

Yes the PR fixed the issue with Autogemm kernels. So I think you are good to go.

hughperkins commented 8 years ago

cool :-) Thanks! :-)