Closed rcoreilly closed 1 year ago
Looks like the struct solution is working.. not sure all the bytes are actually making it up from CPU to GPU but nothing indicates that they are not -- no GPU error messages.
The MinusPhase and PlusPhase kernels are strangely slow with increasing NData so there is something strange there -- otherwise Cycle is scaling really nicely, going from 14 -> 44 secs for NData = 16, so that is essentially a 5x speedup overall -- probably better to combine with mpi at some point but anyway looking good for overall GPU scaling.
yep get a 4x speedup for NData = 8 -- a sweeter spot..
It is unclear what the actual size limits are for storage buffers in vulkan / HLSL. This looks like the most relevant page: https://community.khronos.org/t/allocating-a-buffer-of-more-than-2gb/107460/21
Probably we can use a fixed size struct and index into that.. will experiment..
It is clear that in HLSL, the index into the buffer is limited to being a uint, so we need to index into something larger than the raw float32's. I got
uint64_t
compiling by usingdxc
instead ofglslc
-- we should have been using this all along -- major side-benefit to this excursion, but it issued a warning about down-conversion when accessing the buffer.