doe300 / VC4CL

OpenCL implementation running on the VideoCore IV GPU of the Raspberry Pi models
MIT License
726 stars 79 forks source link

Can we have global work size a multiple of 16? #101

Open Anaphory opened 3 years ago

Anaphory commented 3 years ago

Out of a toy interest, I am trying to run OpenCL and the tree likelihood computation library BEAGLE to run on a PI. BEAGLE assumes that work sizes are divisible by 16, because that's handy for nucleotide substitution matrices, and it fails to run on the 12×12×12 work size limit of VC4CL on the Pi.

Unfortutately, I don't know much about low-level programming and hardware (and I really don't understand any of OpenCL, the Pi's GPU architecture, or what what the work size actually means, sorry), so the question I ask may be a bit dumb: Would it be possible to change the work size?

I have been looking for the source of the magic number here in the repository and found this comment https://github.com/doe300/VC4CL/blob/842d44463af3d967e216ae4adf172b908139e942/src/vc4cl_config.h#L140-L143 If work items can in part be executed sequentially – could I be taught to set some of the work size limits to 48 (the lcm of 12 and 16) for a small performance hit, or is that number embedded too deeply in the code and would require a lot of changes in other places? Like https://github.com/doe300/VC4CL/blob/a00572f359d8a4c792995fa74ff19ac0b42c7705/src/Kernel.cpp#L339

doe300 commented 3 years ago

I think you misunderstood the comment. work-groups can be run sequentially, work-items (single executions within a work-group) must be run in parallel.

The 12 for work-group size (number of work-items in a single work-group) is a hardware/implementation limitation, since we only have 12 cores.

I am currently working on a very experimental optimization to merge work-items, which would then allow for a work-group of more than 12. But whether this can be applied depends on the kernels being executed...

Anaphory commented 3 years ago

Yes, there's probably a lot of confusion in my head about these things. Thank you very much for engaging nonetheless!