Closed cwpearson closed 5 years ago
Done for coherence gpu/gpu in 87d69e04e12015e19488be1396ea539cc931caa4
Done for prefetch gpu/gpu in 03abea0a47136b1bdeb9f10b182eb120c02480a6
Done for zero-copy gpu/gpu in 156d014e7b43d809ec3084be66b2b7a673cebead
CUDA C Programming Guide §3.2.6.3 cudaEventRecord() will fail if the input event and stream are associated with two different devices cudaEventElapsedTime will fail if the two input events are associated with different devices
we solved this problem by using the host wall time between launch both jobs and synchronzing. We could instead time events in one stream, and those events will wrap the transfer as well as a wait on the other stream to finish. Then we don't end up measuring the cost of two stream syncrhronizes on the host.