I'm running this tool on the DGX A100 and H100 machines. I'm interested in determining the aggregate device to host bandwidth, i.e. the total effective bandwidth if all devices send data to the host.
Is there any test case that provides this? The all_to_host_memcpy tests seem promising, but they seem to test for device-to-host bandwidth of a single source device amidst interference from other devices also sending to the host. Is there a way to determine the aggregate bandwidth all devices sending to the host?
I'm running this tool on the DGX A100 and H100 machines. I'm interested in determining the aggregate device to host bandwidth, i.e. the total effective bandwidth if all devices send data to the host.
Is there any test case that provides this? The
all_to_host_memcpy
tests seem promising, but they seem to test for device-to-host bandwidth of a single source device amidst interference from other devices also sending to the host. Is there a way to determine the aggregate bandwidth all devices sending to the host?