[REQUEST] How to get other GPU config

Echozqn commented 11 months ago

Is your feature request related to a problem? Please describe. I recently wanted to test on T4, but I don't know how to measure intra_node information.

Describe the solution you'd like The following is the T4 information I checked, including intra_node_bandwidth_in_GB_per_sec intra_node_min_message_latency inter_node_bandwidth_in_GB_per_sec. I don’t know how to obtain it.

{
    "name": "T4-pcie-16gb",
    "mem_per_GPU_in_GB": 16,
    "hbm_bandwidth_in_GB_per_sec": 320,
    "intra_node_bandwidth_in_GB_per_sec": XXX,
    "intra_node_min_message_latency": XXX,
    "peak_fp16_TFLOPS": 65,
    "peak_i8_TFLOPS": 130,
    "peak_i4_TFLOPS": 260,
    "inter_node_bandwidth_in_GB_per_sec": XXX
}

Echozqn commented 11 months ago

Hello, thank you again for developing the llm-analysis project, it is very helpful to my work. I noticed that the A100 PCIe version uses PCIe 4.0, which has a bidirectional bandwidth of 64 GB/s, so the one-way bandwidth should be 32 GB/s. At the same time, the A100-SXM should use the third generation NVLink, which has a bidirectional speed of 600 GB/s, so the one-way speed should be 300 GB/s.

Against this background, I have a few questions to ask, which I hope will not take up too much of your precious time:

Does the intra_node_bandwidth_in_GB_per_sec parameter refer to the one-way transmission speed?
How did you derive the intra_node_bandwidth_in_GB_per_sec for A100-pcie-40gb you provided in gpu_config? Are there possible errors?
How should I obtain the two parameters intra_node_min_message_latency and inter_node_bandwidth_in_GB_per_sec?

Looking forward to hearing from you, thank you very much for your time and help.

cli99 commented 11 months ago

Yes
The spec says a100-pcie supports NVLink Bridge for 2 GPUs, https://www.nvidia.com/en-us/data-center/a100/. I am not sure what bandwidth to use beyond 2 pcie GPUs. T4 GPUs do not have nvlink, I think 32 GB/s shall be used here.
intra_node_min_message_latency is from v100 spec, I just use it for all gpu types, might not be true. For inter_node_bandwidth_in_GB_per_sec, I assume 200GB/s InfiniBand for networking across nodes, DGX H100 system has 400GB/s InfiniBand

Echozqn commented 11 months ago

I see, thank you very much for your reply.

cli99 / llm-analysis

[REQUEST] How to get other GPU config #13