Open Rhynden opened 1 year ago
Why is all the data from the buffers allocated all the time on the GPU? Shouldn't SYCL be able to copy data to the GPU only when necessary and keep the rest on the host or copy it back to the host when not needed anymore?
In theory, yes. In practice, no SYCL implementation does this to my knowledge. It's just too hard to know if/when data will be needed again on GPU, and you really don't want to pay the price for unnecessary data transfers because memory eviction heuristics fail. For best performance, the most reliable strategy is therefore usually to just use persistent allocations. This is also what allows extensions such as hipSYCL's buffer-USM interoperability: https://github.com/illuhad/hipSYCL/blob/develop/doc/buffer-usm-interop.md
The behavior of the hipSYCL runtime with respect to buffers is described in more detail here: https://github.com/illuhad/hipSYCL/blob/develop/doc/runtime-spec.md (note the section "persistent allocations")
If you want things like oversubscription of GPU memory (it will probably always cost you performance, so I'd not recommend it), you can use shared USM allocations. If backends support it, it will be mapped to memory that automatically migrates between host and device using pagefaulting mechanisms. In this case, oversubscription is managed at the driver level which is probably a better approach if you really need it.
Hi,
I'm curious what is going on with the memory management when using sycl::buffer inside of std::vector.
I have a very small example program:
Compiling for nvidia gpu and running this program gives me the following errors:
I'm curious what is going on. From monitoring I can see that the GPU memory slowly fills up and then probably overflows. Why is all the data from the buffers allocated all the time on the GPU? Shouldn't SYCL be able to copy data to the GPU only when necessary and keep the rest on the host or copy it back to the host when not needed anymore?
Kind regards