33b需要多少显存，怎么量化加载

deepseek-ai / DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself

https://coder.deepseek.com/

MIT License

6.84k stars 473 forks source link

Closed xiaokai01 closed 11 months ago

guoday commented 12 months ago

soloice commented 11 months ago

原始版本：一个参数占两个字节，33B 就是 66 GB，再加上运行时 KV cache 占的空间，单卡的话得 80GB 的显卡才能跑；如果做流水并行/张量并行基本上可以线性减少。量化版本：参考 the bloke 量化的结果。差不多量化到几个 bit 显存就对应节省多少。