alantech / alan

Autoscalable Programming Language
https://alan-lang.org
MIT License
305 stars 10 forks source link

Fix an off-by-one error in the workgroup creation caught by the Raspberry Pi #767

Closed dfellis closed 2 months ago

dfellis commented 2 months ago

So the way it was working before would create 5 parallel tasks for an array of length 4. It worked in most of the drivers because they just ignored an out-of-bounds memory access, while the Raspberry Pi driver, for whatever reason (but good for me), instead does the modulus of the index into the array of memory, so the 5th access wrapped around to the first index and then performed 2 * 4 = 8 to store that into memory.

This fixes that, which fixes things on the Pi, and potentially improves performance (in case the GPU was entering some sort of error path on the other machines and then needing to recover from that) but I have zero evidence on that either way.