Similar to "Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect".
The bandwidth benchmarks already use cudaEvents to compute the bandwidth, but we could explicitly have a latency measurement, where the transfer size is minimal, and a bandwidth measurement, where the transfer size is larger.
Could break it out into two different benchmarks so the reporting is easier to understand.
Similar to "Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect". The bandwidth benchmarks already use cudaEvents to compute the bandwidth, but we could explicitly have a latency measurement, where the transfer size is minimal, and a bandwidth measurement, where the transfer size is larger.
Could break it out into two different benchmarks so the reporting is easier to understand.