k2o333 commented 3 weeks ago

========== == CUDA ==

CUDA Version 12.4.1

This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available. Use the NVIDIA Container Toolkit to start this container with GPU support; see https://docs.nvidia.com/datacenter/cloud-native/ .

Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/root/mindsearch/app.py", line 11, in from lagent.schema import AgentStatusCode File "/opt/py3/lib/python3.10/site-packages/lagent/init.py", line 2, in from .actions import * # noqa: F401, F403 File "/opt/py3/lib/python3.10/site-packages/lagent/actions/init.py", line 3, in from .action_executor import ActionExecutor File "/opt/py3/lib/python3.10/site-packages/lagent/actions/action_executor.py", line 4, in from .base_action import BaseAction File "/opt/py3/lib/python3.10/site-packages/lagent/actions/base_action.py", line 16, in from griffe.enumerations import DocstringSectionKind ModuleNotFoundError: No module named 'griffe.enumerations'

findziliao commented 3 weeks ago

griffe版本需要降级，在backend.dockerfile里加一句：RUN pip install --no-cache-dir -U griffe==0.48.0

k2o333 commented 3 weeks ago

griffe版本需要降级，在backend.dockerfile里加一句：RUN pip install --no-cache-dir -U griffe==0.48.0

谢谢，griffe解决了，不过打开前端输入问题后，后端日志没变化

INFO: Started server process [1] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8002 (Press CTRL+C to quit)

lcolok commented 3 weeks ago

griffe版本需要降级，在backend.dockerfile里加一句：RUN pip install --no-cache-dir -U griffe==0.48.0

谢谢，griffe解决了，不过打开前端输入问题后，后端日志没变化

INFO: Started server process [1] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8002 (Press CTRL+C to quit)

感谢反馈问题，请排查一下是不是跨域的问题：

https://github.com/InternLM/MindSearch/blob/main/docker/README_zh-CN.md#%E8%B7%A8%E5%9F%9F%E8%AE%BF%E9%97%AE%E6%B3%A8%E6%84%8F%E4%BA%8B%E9%A1%B9

方便的话，请附上浏览器中的调控台的报错信息给我们排查。

xs818818 commented 3 weeks ago

[TM][WARNING] [LlamaTritonModel] max_context_token_num = 32776. 2024-08-19 21:33:39,395 - lmdeploy - WARNING - get 227 model params [WARNING] gemm_config.in is not found; using default GEMM algo
HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! INFO: Started server process [2992081] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:23333 (Press CTRL+C to quit) INFO: 127.0.0.1:49862 - "GET /v1/models HTTP/1.1" 200 OK Launched the api_server in process 2992081, user can kill the server by: import os,signal os.kill(2992081, signal.SIGKILL) INFO: 127.0.0.1:49870 - "POST /v1/completions HTTP/1.1" 200 OK terminate called after throwing an instance of 'std::runtime_error' what(): [TM][ERROR] Assertion fail: /lmdeploy/src/turbomind/kernels/attention/attention.cu:35

/usr/lib/python3/dist-packages/apport/report.py:13: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import fnmatch, glob, traceback, errno, sys, atexit, imp, stat Traceback (most recent call last): File "/home/xs/.local/lib/python3.8/site-packages/requests/models.py", line 820, in generate yield from self.raw.stream(chunk_size, decode_content=True) File "/home/xs/.local/lib/python3.8/site-packages/urllib3/response.py", line 1057, in stream yield from self.read_chunked(amt, decode_content=decode_content) File "/home/xs/.local/lib/python3.8/site-packages/urllib3/response.py", line 1206, in read_chunked self._update_chunk_length() File "/home/xs/.local/lib/python3.8/site-packages/urllib3/response.py", line 1136, in _update_chunk_length raise ProtocolError("Response ended prematurely") from None urllib3.exceptions.ProtocolError: Response ended prematurely

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/xs/MindSearch/mindsearch/terminal.py", line 49, in for agent_return in agent.stream_chat('上海今天适合穿什么衣服'): File "/home/xs/MindSearch/mindsearch/agent/mindsearch_agent.py", line 214, in stream_chat for modelstate, response, in self.llm.stream_chat( File "/home/xs/.local/lib/python3.8/site-packages/lagent/llms/lmdeploy_wrapper.py", line 411, in stream_chat for text in self.client.completions_v1( File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/serve/openai/api_client.py", line 299, in completions_v1 for chunk in response.iter_lines(chunk_size=8192, File "/home/xs/.local/lib/python3.8/site-packages/requests/models.py", line 869, in iter_lines for chunk in self.iter_content( File "/home/xs/.local/lib/python3.8/site-packages/requests/models.py", line 822, in generate raise ChunkedEncodingError(e) requests.exceptions.ChunkedEncodingError: Response ended prematurely 问题我用的qwen接口

xs818818 commented 3 weeks ago

python3 -m mindsearch.app --lang cn --model_format qwen --search_engine BingSearch时

xs818818 commented 3 weeks ago

python3 -m mindsearch.terminal [TM][WARNING] [LlamaTritonModel] max_context_token_num = 32776. Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, *self._kwargs) File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/serve/openai/api_server.py", line 1285, in serve VariableInterface.async_engine = pipeline_class( File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/serve/async_engine.py", line 190, in init self._build_turbomind(model_path=model_path, File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/serve/async_engine.py", line 235, in _build_turbomind self.engine = tm.TurboMind.from_pretrained( File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 340, in from_pretrained return cls(model_path=pretrained_model_name_or_path, File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 144, in init self.model_comm = self._from_hf(model_source=model_source, File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 251, in _from_hf self._create_weight(model_comm) File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 170, in _create_weight future.result() File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.get_result() File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in get_result raise self._exception File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(self.args, **self.kwargs) File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 163, in _create_weight_func model_comm.create_shared_weights(device_id, rank) RuntimeError: [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/memory_utils.cu:32 我用的qwen接口，可是默认还是要加载模型

k2o333 commented 3 weeks ago

python3 -m mindsearch.app --lang cn --model_format qwen --search_engine BingSearch时

python3 -m mindsearch.terminal [TM][WARNING] [LlamaTritonModel] max_context_token_num = 32776. Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, self._kwargs) File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/serve/openai/api_server.py", line 1285, in serve VariableInterface.async_engine = pipeline_class( File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/serve/async_engine.py", line 190, in init self._build_turbomind(model_path=model_path, File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/serve/async_engine.py", line 235, in _build_turbomind self.engine = tm.TurboMind.from_pretrained( File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 340, in from_pretrained return cls(model_path=pretrained_model_name_or_path, File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 144, in init* self.model_comm = self._from_hf(model_source=model_source, File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 251, in _from_hf self._create_weight(model_comm) File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 170, in _create_weight future.result() File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.get_result() File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in get_result raise self._exception File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(self.args, **self.kwargs) File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 163, in _create_weight_func model_comm.create_shared_weights(device_id, rank) RuntimeError: [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/memory_utils.cu:32 我用的qwen接口，可是默认还是要加载模型

我是docker启动的，然后因为用外部模型，我把部署gpu的部分删除了，是不是因此就没有反应了呢？

lcolok commented 3 weeks ago

@k2o333 最新的优化应该能解决您的问题：https://github.com/InternLM/MindSearch/pull/170 ，等待仓库管理员测试并合并代码就可以了。

lcolok commented 3 weeks ago

@xs818818 使用 Qwen 模型的方法我也没有跑通，应该是 mindsearch/agent 这模块下的逻辑问题，包括采用 SiliconFlow 的 API 的情况下，我也只能跑通使用 internlm/internlm2_5-7b-chat 这个模型的情况。

mengrennwpu commented 2 weeks ago

@xs818818 使用 Qwen 模型的方法我也没有跑通，应该是 mindsearch/agent 这模块下的逻辑问题，包括采用 SiliconFlow 的 API 的情况下，我也只能跑通使用 internlm/internlm2_5-7b-chat 这个模型的情况。

@lcolok 其他模型没有跑通很正常，因为当前internlm/internlm2_5-7b-chat 针对这个搜索RAG的场景是微调过的

InternLM / MindSearch

用qwen模型的报错，也不知道是不是和模型有关。docker部署源码部署都如此 #166

========== == CUDA ==