[Feature Request]: 请教关于显存和LLM x MapReduce的问题

OpenBMB / MiniCPM

MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.

Apache License 2.0

7.12k stars 453 forks source link

[Feature Request]: 请教关于显存和LLM x MapReduce的问题 #208

Closed wciq1208 closed 1 month ago

wciq1208 commented 2 months ago

Feature request / 功能建议

我用vllm进行部署,命令如下

vllm serve /hestia/model/MiniCPM3-4B --trust-remote-code --max-model-len 12288 --num-gpu-blocks-override 768 --port 8001 --max-num-seqs 32 --served-model-name minicpm --swap-space 0

12288的上下文长度就消耗了22G的显存，我看readme里提到了LLM x MapReduce可以低显存处理无限上下文，请问要如何开启

ahkimkoo commented 2 months ago

同问，有点超预期了。看到4B，我想当然就是8G显存。没想到22G

LDLINGLINGLING commented 2 months ago

你好，这里提到的长上下文都是要消耗显存的，也就是说，4b模型不量化的情况下，占用显存在8G左右，但是上下文的增长将导致额外的显存占用。并不能在8g内存下，使用无限长上下文。

shuo-git commented 2 months ago

你好，当前代码还不包括MapReduce的功能，MiniCPM3 x MapReduce 的代码将在一周内开源

sycamore792 commented 1 month ago

你好，当前代码还不包括MapReduce的功能，MiniCPM3 x MapReduce 的代码将在一周内开源

你好，这块目前有进展吗

shuo-git commented 1 month ago

您好，请参考开源仓库：https://github.com/thunlp/LLMxMapReduce 详细技术报告近期会公开