I ran the benchmark on a single node with two NVIDIA GeForce RTX 2080 Ti GPUs. I tested device_to_device_memcpy_write_ce, however, the program waived the test.
$ ./nvbandwidth -t device_to_device_memcpy_write_ce
nvbandwidth Version: v0.2
Built from Git version: 6cefdda
NOTE: This tool reports current measured bandwidth on your system.
Additional system-specific tuning may be required to achieve maximal peak bandwidth.
CUDA Runtime Version: 12010
CUDA Driver Version: 12000
Driver Version: 525.125.06
Device 0: NVIDIA GeForce RTX 2080 Ti
Device 1: NVIDIA GeForce RTX 2080 Ti
Waiving device_to_device_memcpy_write_ce.
host_to_device_bidirectional_memcpy_ce and host_to_device_memcpy_ce tests are fine.
I ran the benchmark on a single node with two NVIDIA GeForce RTX 2080 Ti GPUs. I tested device_to_device_memcpy_write_ce, however, the program waived the test.
host_to_device_bidirectional_memcpy_ce
andhost_to_device_memcpy_ce
tests are fine.