Closed frankwswang closed 1 year ago
Could you please show the memory usage of your GPU?
Confirmed as a bug, it launches to many blocks, which is 2^8*4096
.
It could be fixed by modifying the thread, block decision function, like
@inline function CuYao.cudiv(x::Int, y::Int)
max_threads = 512
threads_x = min(max_threads, x)
threads_y = min(max_threads ÷ threads_x, y)
threads = (threads_x, threads_y)
blocks = ceil.(Int, (x, y) ./ threads)
threads, blocks
end
I will look into it and fix this issue one for all.
Thanks for your report.
This problem seems does not exist anymore.