alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:
https://alpaka.readthedocs.io
Mozilla Public License 2.0
358 stars 74 forks source link

Querying the size of the dynamic shared memory #1648

Open fwyzard opened 2 years ago

fwyzard commented 2 years ago

At least for debugging purposes it would be useful to be able to query the the size of the dynamic shared memory from within the device code.

In CUDA this can be done with some inline PTX (see https://stackoverflow.com/questions/40021086/can-i-obtain-the-amount-of-allocated-dynamic-shared-memory-from-within-a-kernel/40021087#40021087):

     __device__ __forceinline__ unsigned dynamic_smem_size() {
       unsigned ret;
       asm volatile("mov.u32 %0, %dynamic_smem_size;" : "=r"(ret));
       return ret;
     }

With ROCm I have no idea :-)

Is this something that Alpaka could wrap, and provide also for the CPU-based backends ?

j-stephan commented 2 years ago

No idea, we have to investigate this first. If it is possible I see no reason against this.

psychocoderHPC commented 2 years ago

In general, it should be possible. We could store it as a member of the accelerator.

The question is if alpaka should provide such an interface. The size of the dynamic shared memory is set by the user before submitting the kernel so if this knowledge is required within the kernel the user should pass it as an argument. So if the user is fine with the additional overhead of passing one more variable to the device he/she can do it but by default, it will not affect any other kernel/application.

I do not see a problem to have this function only activated when the debug mode is enabled but we should add an static assert in case someone is using the function outside of the debug mode, a clear error message will avoid a lot of debugging pain.