Just follow the README and meet /lmdeploy/src/turbomind/kernels/attention/attention.cu issue.

root@iZ0xiaotv8ztqk9kkzy72iZ:~/MindSearch# python3 -m mindsearch.app --lang en --model_format internlm_server --search_engine DuckDuckGoSearch INFO: Started server process [3266] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8002 (Press CTRL+C to quit) /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( Fetching 20 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 150603.38it/s] [TM][WARNING] [LlamaTritonModel] max_context_token_num is not set, default to 32768. 2024-09-20 16:15:55,310 - lmdeploy - WARNING - get 227 model params [WARNING] gemm_config.in is not found; using default GEMM algo
HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! INFO: Started server process [3280] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:23333 (Press CTRL+C to quit) INFO: 127.0.0.1:52416 - "GET /v1/models HTTP/1.1" 200 OK Launched the api_server in process 3280, user can kill the server by: import os,signal os.kill(3280, signal.SIGKILL) INFO: 127.0.0.1:52400 - "POST /solve HTTP/1.1" 200 OK INFO: 127.0.0.1:52418 - "POST /v1/completions HTTP/1.1" 200 OK terminate called after throwing an instance of 'std::runtime_error' what(): [TM][ERROR] Assertion fail: /lmdeploy/src/turbomind/kernels/attention/attention.cu:35

ERROR:root:Exception in sync_generator_wrapper: Response ended prematurely Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 820, in generate yield from self.raw.stream(chunk_size, decode_content=True) File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 1057, in stream yield from self.read_chunked(amt, decode_content=decode_content) File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 1206, in read_chunked self._update_chunk_length() File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 1136, in _update_chunk_length raise ProtocolError("Response ended prematurely") from None urllib3.exceptions.ProtocolError: Response ended prematurely

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/MindSearch/mindsearch/app.py", line 73, in sync_generator_wrapper for response in agent.stream_chat(inputs): File "/root/MindSearch/mindsearch/agent/mindsearch_agent.py", line 214, in stream_chat for modelstate, response, in self.llm.stream_chat( File "/usr/local/lib/python3.10/dist-packages/lagent/llms/lmdeploy_wrapper.py", line 411, in stream_chat for text in self.client.completions_v1( File "/usr/local/lib/python3.10/dist-packages/lmdeploy/serve/openai/api_client.py", line 299, in completions_v1 for chunk in response.iter_lines(chunk_size=8192, File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 869, in iter_lines for chunk in self.iter_content( File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 822, in generate raise ChunkedEncodingError(e) requests.exceptions.ChunkedEncodingError: Response ended prematurely

Hi team, I just followed the readme, pip requirements.txt and start the project, then run the backend terminal and meet this issue.

PS：因为有很多外国coder在关注这个项目，所以我就用英文提问了，这样可能会帮助你们提高一些关注度，还有一个问题，希望大佬能帮忙解答一下，现在进行搜索的时候Bing_Browser是怎么把搜索结果塞给模型的，这段代码在哪里？多谢

InternLM / MindSearch

Just follow the README and meet /lmdeploy/src/turbomind/kernels/attention/attention.cu issue. #209