为何我在A800上运行DeepSeek-V2-Lite-Chat (SFT)，竟然消耗60G的显存？！

juhengzhe commented 4 months ago

权重文件一共32G左右。为啥实际加载模型后，占用内存将近60多G呢。

juhengzhe commented 4 months ago

模型加载时，通过指定数据类型为float16避免使用全精度，可以使内存降到40G以下。

liangfang commented 4 months ago

注意到这句话—— The model has a long context length (163840). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.

但是我也想就此请教一下long context length为啥消耗显存那么多？

beep-bebop commented 3 months ago

注意到这句话—— The model has a long context length (163840). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. 该模型的上下文长度很长（163840）。这可能会在初始内存分析阶段导致OOM错误，或者由于KV缓存空间较小而导致性能低下。 Consider setting --max-model-len to a smaller value. 考虑将--max-mode-len设置为较小的值。

但是我也想就此请教一下long context length为啥消耗显存那么多？

我猜是为long context做了显存的预分配，后续推理的时候显存不会变化太多。

deepseek-ai / DeepSeek-V2

为何我在A800上运行DeepSeek-V2-Lite-Chat (SFT)，竟然消耗60G的显存？！ #74