Open hominhquan opened 9 years ago
There are few places in getDefaultSterGranulation()
function where maximum supported work group size of 64=8*8 is proposed.
Setting work group size in wgX
and wgY
to minimal values:
You can check (maxWorkGroupSize < 64)
case and set
minSuppWgX = floor(sqrt(maxWorkGroupSize));
minSuppWgY = maxWorkGroupSize / minSuppWgX;
And replace each
wgX = 8;
wgY = 8;
in mentioned cases to
wgX = minSuppWgX;
wgY = minSuppWgY;
There is code also where wgX
and wgY
are set up to some constants in lines 1533-1545, reducion of wgX
and wgY
like while((wgY * wgX) > maxWorkGroupSize)
from line 1614 should be used additionally.
Just put some values in wgX
and wgY
and make sure pgran->wgSize[0]*pgran->wgSize[1]
is less than maxWorkGroupSize
for your device and everything should be fine.
Feel free to ask questions, I am competent with this library code.
Timur.
Hello, I'm working on porting clBLAS on my company's accelerator. Our OpenCL library only support maximum 16 work-items per work group. So I fall in the unimplemented case in the kernel generator (solution_seq_make.cpp:getDefaultStepGranulation() ) where maxWorkGroupSize < 64 is not supported. I would like to implement this but don't know how to do and its algorithms back-scene. Anyone can help or explain me ? Thanks in advance. Quan