clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
839 stars 240 forks source link

What if maxWorkGroupSize < 64 #64

Open hominhquan opened 9 years ago

hominhquan commented 9 years ago

Hello, I'm working on porting clBLAS on my company's accelerator. Our OpenCL library only support maximum 16 work-items per work group. So I fall in the unimplemented case in the kernel generator (solution_seq_make.cpp:getDefaultStepGranulation() ) where maxWorkGroupSize < 64 is not supported. I would like to implement this but don't know how to do and its algorithms back-scene. Anyone can help or explain me ? Thanks in advance. Quan

tmagomedov commented 9 years ago

There are few places in getDefaultSterGranulation() function where maximum supported work group size of 64=8*8 is proposed. Setting work group size in wgX and wgY to minimal values:

You can check (maxWorkGroupSize < 64) case and set

minSuppWgX = floor(sqrt(maxWorkGroupSize));
minSuppWgY = maxWorkGroupSize / minSuppWgX;

And replace each

wgX = 8;
wgY = 8;

in mentioned cases to

wgX = minSuppWgX;
wgY = minSuppWgY;

There is code also where wgX and wgY are set up to some constants in lines 1533-1545, reducion of wgX and wgY like while((wgY * wgX) > maxWorkGroupSize) from line 1614 should be used additionally.

Just put some values in wgX and wgY and make sure pgran->wgSize[0]*pgran->wgSize[1] is less than maxWorkGroupSize for your device and everything should be fine. Feel free to ask questions, I am competent with this library code. Timur.