Open dammad opened 9 years ago
@dammad This library was developed by AMD and they are doing all they can to support platforms that they may or may not have access to. Since this library is out in the open, I think it is up to the other hardware vendors (or users like you and me) to see if they can contribute code that will improve the performance.
P.S. I do not work for AMD.
Indeed. I posted because I was wondering whether anyone had seen this before or had any insights.
Dammad.
Sent from my phone.
On Aug 23, 2015, at 4:03 PM, Pavan Yalamanchili notifications@github.com wrote:
@dammad This library was developed by AMD and they are doing all they can to support platforms that they may or may not have access to. Since this library is out in the open, I think it is up to the other hardware vendors (or users like you and me) to see if they can contribute code that will improve the performance.
P.S. I do not work for AMD.
— Reply to this email directly or view it on GitHub.
Hi @dammad We've seen CL_INVALID_WORK_GROUP_SIZE on various opencl implementations, especially wrt CPU devices. Some runtimes require a workgroup to be a certain shape, and we have not designed clblas to launch kernels of any arbitrary workgroup shape. Can you query your device for CL_DEVICE_MAX_WORK_GROUP_SIZE and CL_DEVICE_MAX_WORK_ITEM_SIZES?
Thanks for the info. My platform reports 1 and (1,1,1) for those. I’m looking to see if I can modify the code to work with those sizes.
Alan.
On August 23, 2015 at 8:25:40 PM, Kent Knox (notifications@github.com) wrote:
Hi @dammad We've seen CL_INVALID_WORK_GROUP_SIZE on various opencl implementations, especially wrt CPU devices. Some runtimes require a workgroup to be a certain shape, and we have not designed clblas to launch kernels of any arbitrary workgroup shape. Can you query your device for CL_DEVICE_MAX_WORK_GROUP_SIZE and CL_DEVICE_MAX_WORK_ITEM_SIZES?
— Reply to this email directly or view it on GitHub.
@dammad I believe our kernels are written to assume the shape of the workgroup is at least 64 wide. See #64, #25
clBLAS-tune errors with the following on the Adreno 320 GPU?
./clBLAS-tune --gemm --float --fast
GEMM is being tuned, progress: 0.12% clEnqueueNDRangeKernel() failed with -54(CL_INVALID_WORK_GROUP_SIZE)
Performance is also terrible, perhaps 100MFlops on a platform that is perhaps 90GFlops theoretical??