InternLM / MindSearch

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)
https://mindsearch.netlify.app/
Apache License 2.0
4.9k stars 494 forks source link

Just follow the README and meet /lmdeploy/src/turbomind/kernels/attention/attention.cu issue. #209

Open bombert opened 3 weeks ago

bombert commented 3 weeks ago

root@iZ0xiaotv8ztqk9kkzy72iZ:~/MindSearch# python3 -m mindsearch.app --lang en --model_format internlm_server --search_engine DuckDuckGoSearch INFO: Started server process [3266] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8002 (Press CTRL+C to quit) /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( Fetching 20 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 150603.38it/s] [TM][WARNING] [LlamaTritonModel] max_context_token_num is not set, default to 32768. 2024-09-20 16:15:55,310 - lmdeploy - WARNING - get 227 model params [WARNING] gemm_config.in is not found; using default GEMM algo
HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! INFO: Started server process [3280] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:23333 (Press CTRL+C to quit) INFO: 127.0.0.1:52416 - "GET /v1/models HTTP/1.1" 200 OK Launched the api_server in process 3280, user can kill the server by: import os,signal os.kill(3280, signal.SIGKILL) INFO: 127.0.0.1:52400 - "POST /solve HTTP/1.1" 200 OK INFO: 127.0.0.1:52418 - "POST /v1/completions HTTP/1.1" 200 OK terminate called after throwing an instance of 'std::runtime_error' what(): [TM][ERROR] Assertion fail: /lmdeploy/src/turbomind/kernels/attention/attention.cu:35

ERROR:root:Exception in sync_generator_wrapper: Response ended prematurely Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 820, in generate yield from self.raw.stream(chunk_size, decode_content=True) File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 1057, in stream yield from self.read_chunked(amt, decode_content=decode_content) File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 1206, in read_chunked self._update_chunk_length() File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 1136, in _update_chunk_length raise ProtocolError("Response ended prematurely") from None urllib3.exceptions.ProtocolError: Response ended prematurely

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/MindSearch/mindsearch/app.py", line 73, in sync_generator_wrapper for response in agent.stream_chat(inputs): File "/root/MindSearch/mindsearch/agent/mindsearch_agent.py", line 214, in stream_chat for modelstate, response, in self.llm.stream_chat( File "/usr/local/lib/python3.10/dist-packages/lagent/llms/lmdeploy_wrapper.py", line 411, in stream_chat for text in self.client.completions_v1( File "/usr/local/lib/python3.10/dist-packages/lmdeploy/serve/openai/api_client.py", line 299, in completions_v1 for chunk in response.iter_lines(chunk_size=8192, File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 869, in iter_lines for chunk in self.iter_content( File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 822, in generate raise ChunkedEncodingError(e) requests.exceptions.ChunkedEncodingError: Response ended prematurely

Hi team, I just followed the readme, pip requirements.txt and start the project, then run the backend terminal and meet this issue.

PS: 因为有很多外国coder在关注这个项目,所以我就用英文提问了,这样可能会帮助你们提高一些关注度,还有一个问题,希望大佬能帮忙解答一下,现在进行搜索的时候Bing_Browser是怎么把搜索结果塞给模型的,这段代码在哪里?多谢

bombert commented 3 weeks ago

Hi team,

You could mark this issue as resolved, it was caused by the lmdeploy could not start model for pytorch version. My environment is

Ubuntu 22.04 V100 Driver Version: 550.54.14 lmdeploy==0.6.0 torch==2.3.1 tranformers==4.44.2

As this ticket said https://github.com/InternLM/lmdeploy/issues/2269

I changed my torch version to 2.2.2, I fixed this issue.