The traits BlockSharedMemDynSizeBytes is designed with a local view in mind. The number of threads and elements per thread where passed to the trait ref
If you like to implement a global prefix sum as it could be helpful to know how many blocks are involved.
The example is not running (syntax issues) but it shows a general way how to implement it.
IMO there is not reason to limit knowledge of the trait to information about a block even if shared memory can not by accesses by other blocks.
My suggestion: pass the workDiv to the trait instead of the thread extents and number of elements per thread.
The traits
BlockSharedMemDynSizeBytes
is designed with a local view in mind. The number of threads and elements per thread where passed to the trait ref If you like to implement a global prefix sum as it could be helpful to know how many blocks are involved. The example is not running (syntax issues) but it shows a general way how to implement it.IMO there is not reason to limit knowledge of the trait to information about a block even if shared memory can not by accesses by other blocks.
My suggestion: pass the
workDiv
to the trait instead of the thread extents and number of elements per thread.