Syncleus / aparapi

The New Official Aparapi: a framework for executing native Java and Scala code on the GPU.
http://aparapi.com
Apache License 2.0
466 stars 59 forks source link

Kernel overall local size #171

Open forreg16 opened 1 year ago

forreg16 commented 1 year ago

Good afternoon.

When I use the NVIDIA GeForce RTX 3060 Ti graphics card in java-code, I get an error: Kernel overall local size: 1000 exceeds maximum kernel allowed local size of: 256 failed Running the same code on an Intel HD Graphics 630 or AMD RadeonT R7 450 graphics card, everything works fine. If in this part of the code I put a number less than 256, then the code with the NVIDIA GeForce RTX 3060 Ti graphics card works fine:

Range range = needDevice.createRange(255);
kernel.execute(range)

The NVIDIA GeForce RTX 3060 Ti video card is more modern than the Intel HD Graphics 630 or AMD RadeonT R7 450, but for some reason the parameter for createRange is less than for older video cards. What could be the problem?

trayanmomkov commented 1 year ago

Hey @forreg16 I have the same problem. My card is RTX 3070 and I run it on Linux. The problem happens because the max group size is hardcoded to be 256: public static final int MAX_OPENCL_GROUP_SIZE = 256; I don't know why this is the max, I don't have experience with OpenCL. I hope the maintainers of the project will answer here. Maybe our option is to change the value and recompile the library but I don't know is there any instructions how to do that?

forreg16 commented 1 year ago

Hey @trayanmomkov. Try this version of the code. In this case, my parameter size can be set to more than 256.

Range range = needDevice.createRange2D(size, 1); 
kernel.execute(range);

you can see more details here https://stackoverflow.com/questions/75365328/error-exceeds-maximum-kernel-allowed-local-size

trayanmomkov commented 1 year ago

But @forreg16 you can achieve that with create(size, localSize) where localSize <= 256 and size % localSize == 0. The real problem is that localSize cannot be greater than 256. On my card which has 5888 cores I want to have greater localSize to achieve better performance. And actually Aparapi automatically chooses the localSize of 640 but when tries to set it I get the error:

!!!!!!! Kernel overall local size: 640 exceeds maximum kernel allowed local size of: 256 failed (null)