cli99 / llm-analysis

Latency and Memory Analysis of Transformer Models for Training and Inference
Apache License 2.0
343 stars 40 forks source link

[REQUEST] How to get other GPU config #13

Closed Echozqn closed 11 months ago

Echozqn commented 11 months ago

Is your feature request related to a problem? Please describe. I recently wanted to test on T4, but I don't know how to measure intra_node information.

Describe the solution you'd like The following is the T4 information I checked, including intra_node_bandwidth_in_GB_per_sec intra_node_min_message_latency inter_node_bandwidth_in_GB_per_sec. I don’t know how to obtain it.

{
    "name": "T4-pcie-16gb",
    "mem_per_GPU_in_GB": 16,
    "hbm_bandwidth_in_GB_per_sec": 320,
    "intra_node_bandwidth_in_GB_per_sec": XXX,
    "intra_node_min_message_latency": XXX,
    "peak_fp16_TFLOPS": 65,
    "peak_i8_TFLOPS": 130,
    "peak_i4_TFLOPS": 260,
    "inter_node_bandwidth_in_GB_per_sec": XXX
}
Echozqn commented 11 months ago

Hello, thank you again for developing the llm-analysis project, it is very helpful to my work. I noticed that the A100 PCIe version uses PCIe 4.0, which has a bidirectional bandwidth of 64 GB/s, so the one-way bandwidth should be 32 GB/s. At the same time, the A100-SXM should use the third generation NVLink, which has a bidirectional speed of 600 GB/s, so the one-way speed should be 300 GB/s.

Against this background, I have a few questions to ask, which I hope will not take up too much of your precious time:

  1. Does the intra_node_bandwidth_in_GB_per_sec parameter refer to the one-way transmission speed?
  2. How did you derive the intra_node_bandwidth_in_GB_per_sec for A100-pcie-40gb you provided in gpu_config? Are there possible errors?
  3. How should I obtain the two parameters intra_node_min_message_latency and inter_node_bandwidth_in_GB_per_sec?

Looking forward to hearing from you, thank you very much for your time and help.

image image
cli99 commented 11 months ago
  1. Yes
  2. The spec says a100-pcie supports NVLink Bridge for 2 GPUs, https://www.nvidia.com/en-us/data-center/a100/. I am not sure what bandwidth to use beyond 2 pcie GPUs. T4 GPUs do not have nvlink, I think 32 GB/s shall be used here.
  3. intra_node_min_message_latency is from v100 spec, I just use it for all gpu types, might not be true. For inter_node_bandwidth_in_GB_per_sec, I assume 200GB/s InfiniBand for networking across nodes, DGX H100 system has 400GB/s InfiniBand
Echozqn commented 11 months ago

I see, thank you very much for your reply.