Characterizing Power Management Opportunities for LLMs in the Cloud

文章通过限制频率和功率，测试了电力对 LLM 的影响（尤其是训练）

文章里的几个 insights 很有意思

The peak power draw across GPUs in LLM training iterations often reaches or exceeds their TDP. For cluster power design, this means that LLM training clusters need to overprovision GPU power to ensure power safety

Large power swings are common in LLM training due to alternating computation- and communication-intensive phases across many GPUs. Since current power delivery infrastructure cannot always safely support large-scale power swings, LLM training clusters need specialized power infrastructure and management.

Power capping reduces peak power draw without affecting troughs, making it effective at reducing the magnitude of training power swings. Frequency locking lowers the overall power consumption, making it effective at reclaiming power on demand. Thus, both are useful in improving the power management in LLM training clusters.

LLM inference has distinct power consumption phases corresponding to prompt computation and token generation: prompt phases are brief and typically reach or exceed GPU TDP, whereas token phases are longer and draw less power. For cluster power design, this means that peak power in LLM inference clusters must be provisioned for the prompt phases, but doing so leads to underutilization during token phases; this mismatch must be addressed to improve power efficiency. LLM 推理具有与快速计算和令牌生成相对应的不同功耗阶段：快速阶段很短，通常达到或超过 GPU TDP，而令牌阶段较长且耗电量较少。对于集群电源设计，这意味着必须为快速阶段配置 LLM 推理集群中的峰值功率，但这样做会导致令牌阶段的利用率不足；必须解决这种不匹配问题以提高电源效率。

dyweb / papers-notebook

Characterizing Power Management Opportunities for LLMs in the Cloud #294