Closed mahmoodn closed 2 years ago
When USE_ASYNC_STREAM is defined, we use a GPU buffer allocated with cudaMalloc and cudaMemcpyAsync from the CPU to pull from that buffer when it is full.
When USE_ASYNC_STREAM is not defined, we use a GPU buffer allocated with cudaMallocManaged and we normally read from it from the CPU when it is full relying on the UVM driver to move pages.
Depending on the system, the driver and the workload, one approach could be better but for most of the use cases USE_ASYNC_STREAM seems to be the best.
Thanks. You mean "defined" is better for most of the workloads? I guess so... But the default is "undefined" in channel.hpp.
This is what we have in version 1.5.3
Not sure if you are looking to an old file or a modified one.
Yes I am using 1.5.3. I was thinking that in order to define the variable, I have to use #define USE_ASYNC_STREAM 1
but the default has no value. As I checked, even without any value, the #ifdef
will be true.
Thanks for clarification.
Hi I see
#define USE_ASYNC_STREAM
in channel.hpp. So, I would like to know in what circumstances, it is beneficial to enable that?