The current design is hardly coupled to a gridsize of 200 blocks and a blocksize of 128 blocks:
Because of the bitshift in our block based ray scheduling, we need to set the thread number to powers of two. A multiply has the same throughput than a bitshift and could be used instead to make arbitrary block sizes possible.
The mapPrefixSumToPrisms function assumes a large number of threads spawned, otherwise not all values of the prefixsum are mapped.
Are there other places in the source code that force special grid or block size ?
We should eliminate such places as much as we can!
The current design is hardly coupled to a gridsize of 200 blocks and a blocksize of 128 blocks:
Are there other places in the source code that force special grid or block size ?
We should eliminate such places as much as we can!