InternLM / MindSearch

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)
https://mindsearch.netlify.app/
Apache License 2.0
4.54k stars 452 forks source link

本地部署显存不足如何多卡运行? #162

Open qing-tian-meng-ying opened 4 weeks ago

qing-tian-meng-ying commented 4 weeks ago

llm = LMDeployServer(path='internlm/internlm2_5-7b-chat', model_name='internlm2', meta_template=INTERNLM2_META, top_p=0.8, top_k=1, temperature=0, max_new_tokens=8192, repetition_penalty=1.02, stop_words=['<|im_end|>']) 是从这里修改吗?

mengrennwpu commented 3 weeks ago

@qing-tian-meng-ying LMDeployServer中有个张量并行的tp参数可以设置,但貌似源码lmdeploy.api.py中的serve又将tp从kwargs中pop了,导致后面模型的初始化tp又成了1,不过手动改下就可以了。源码如下:

`python

def serve(model_path: str, model_name: Optional[str] = None, backend: Literal['turbomind', 'pytorch'] = 'turbomind', backend_config: Optional[Union[TurbomindEngineConfig, PytorchEngineConfig]] = None, chat_template_config: Optional[ChatTemplateConfig] = None, server_name: str = '0.0.0.0', server_port: int = 23333, log_level: str = 'ERROR', api_keys: Optional[Union[List[str], str]] = None, ssl: bool = False, **kwargs):

import time
from multiprocessing import Process

from lmdeploy.serve.openai.api_client import APIClient
from lmdeploy.serve.openai.api_server import serve

if type(backend_config) is not PytorchEngineConfig:
    # set auto backend mode
    backend_config = autoget_backend_config(model_path, backend_config)
backend = 'pytorch' if type(
    backend_config) is PytorchEngineConfig else 'turbomind'
if 'tp' in kwargs:
    tp = kwargs['tp']
    kwargs.pop('tp')

`

Nuclear6 commented 2 weeks ago

应该就是这个参数的原因,执行lmdeploy serve api_server /work/internlm2_5-7b-chat --server-port 8089 --tp 1

RuntimeError: [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/memory_utils.cu:32

执行 lmdeploy serve api_server /work/internlm2_5-7b-chat --server-port 8089 --tp 4

image

我看我引入包tp设置也没问题,我再检查下

Nuclear6 commented 2 weeks ago

修改这个文件中的python3.11/site-packages/lmdeploy/messages.py 141行tp: int = 4即可

tp 参数直接传递不生效,修改后4卡A10上运行结果如下:

image