eyalroz / cuda-kat

CUDA kernel author's tools
BSD 3-Clause "New" or "Revised" License
104 stars 8 forks source link

Account for dimensions exceeding the domain of 32-bit (unsigned) integers #90

Open eyalroz opened 3 years ago

eyalroz commented 3 years ago

A linear CUDA grid can have 2^31-1 blocks (in the x dimension), each of size 1024 elements, for a total of a little under 2^41 threads. Currently, our types and grid_info functions assume all dimensions can fit within 32-bit integers... and that is not the case.

At the same time, it is costly to default to use 64-bit values for dimensions when a kernel author knows that the dimensions don't actually exceed 32-bits. (Limiting to 16 bits is less useful, since NVIDIA GPU cores don't operate faster on 16-bit integers).

So, we need to figure out how to support over-32-bit dimensions while not forcing them on users by default. Currently we simply do not support this.