NVIDIA / nvbandwidth

A tool for bandwidth measurements on NVIDIA GPUs.
Apache License 2.0
315 stars 30 forks source link

[GH200] Unexpected Low Host-to-Device Bandwidth #23

Open vitduck opened 2 months ago

vitduck commented 2 months ago

Hi,

We observed an unexpected low host-to-device bandwidth on GH200 Superchip.

  1. Specs

    • GH200 (480GB LPDDR5X + 96GB HBM3)
    • OS: Rocky Linux 9.3 (Blue Onyx)
    • nvbandwith: v5.0
  2. nvidia-smi topology:

            GPU0    NIC0    NIC1    CPU Affinity    NUMA Affinity   GPU NUMA ID
    GPU0     X  SYS SYS 0-71    0       1
    NIC0    SYS  X  PIX             
    NIC1    SYS PIX  X  
  3. output: nvbandwidth-gh200.log nvbw

    • SM is expected to give high bandwidth that CE in general since the latter is limited by DMA engine.
    • Bandwidth anomaly:
      • Host-to-device: CE (346 GB/s) vs SM (342 GB/s)
      • From NVIDIA reference result[, there should be ~ 15% difference between them.
      • Is there some hardware/kernel settings that negatively affect the SM-variant memcpy ?
    • Bandwidth asymmetry:
      • For GH200, H2D is approximately 15% faster than D2H, is this due to an intrinsic property of ATS ?
  4. ref:

Regards.

deepakcu commented 2 months ago

Are the GPU clocks locked? To what value? Can you attach the output of nvidia-smi -q?

vitduck commented 2 months ago

@deepakcu

I have attached here the output of lscpu and nvidia-smi -q per your requested. I believe everything is running at stock.

Below is addition information

I also check dmesg immediately after running nvbandwidth and didn't observe and warning or error.

Thanks.

deepakcu commented 2 months ago

Can you repeat your test locking clocks at max (1980MHz)

sudo nvidia-smi --lock-gpu-clocks=1980,1980