chatchat-space / Langchain-Chatchat

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Apache License 2.0
31.45k stars 5.49k forks source link

运行 python startup.py -a 加载chatglm3-6b模型到71%卡住不动 #3734

Closed starage2020 closed 5 months ago

starage2020 commented 5 months ago

(venv_lcglm) root@test-GPU:/dev/shm/Langchain-Chatchat# python3 startup.py -a

==============================Langchain-Chatchat Configuration============================== 操作系统:Linux-5.15.0-56-generic-x86_64-with-glibc2.35. python版本:3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] 项目版本:v0.2.10 langchain版本:0.0.354. fastchat版本:0.2.35

当前使用的分词器:ChineseRecursiveTextSplitter 当前启动的LLM模型:['chatglm3-6b', 'zhipu-api', 'openai-api'] @ cuda {'device': 'cuda', 'host': '0.0.0.0', 'infer_turbo': False, 'model_path': '/dev/shm/chatglm3-6b', 'model_path_exists': True, 'port': 20002} {'api_key': '', 'device': 'auto', 'host': '0.0.0.0', 'infer_turbo': False, 'online_api': True, 'port': 21001, 'provider': 'ChatGLMWorker', 'version': 'glm-4', 'worker_class': <class 'server.model_workers.zhipu.ChatGLMWorker'>} {'api_base_url': 'https://api.openai.com/v1', 'api_key': '', 'device': 'auto', 'host': '0.0.0.0', 'infer_turbo': False, 'model_name': 'gpt-4', 'online_api': True, 'openai_proxy': '', 'port': 20002} 当前Embbedings模型: bge-large-zh-v1.5 @ cuda ==============================Langchain-Chatchat Configuration==============================

2024-04-12 11:54:29,189 - startup.py[line:655] - INFO: 正在启动服务: 2024-04-12 11:54:29,189 - startup.py[line:656] - INFO: 如需查看 llm_api 日志,请前往 /dev/shm/Langchain-Chatchat/logs /usr/local/lib/python3.10/dist-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: 模型启动功能将于 Langchain-Chatchat 0.3.x重写,支持更多模式和加速启动,0.2.x中相关功能将废弃 warn_deprecated( 2024-04-12 11:54:37 | INFO | model_worker | Register to controller 2024-04-12 11:54:38 | ERROR | stderr | INFO: Started server process [3977087] 2024-04-12 11:54:38 | ERROR | stderr | INFO: Waiting for application startup. 2024-04-12 11:54:38 | ERROR | stderr | INFO: Application startup complete. 2024-04-12 11:54:38 | ERROR | stderr | INFO: Uvicorn running on http://0.0.0.0:20000 (Press CTRL+C to quit) 2024-04-12 11:54:41 | INFO | model_worker | Loading the model ['chatglm3-6b'] on worker 4a547471 ... Loading checkpoint shards: 0%| | 0/7 [00:00<?, ?it/s] Loading checkpoint shards: 14%|███████████████████████████▋ | 1/7 [00:00<00:05, 1.14it/s] Loading checkpoint shards: 29%|███████████████████████████████████████████████████████▍ | 2/7 [00:01<00:04, 1.12it/s] Loading checkpoint shards: 43%|███████████████████████████████████████████████████████████████████████████████████▏ | 3/7 [00:02<00:03, 1.17it/s] Loading checkpoint shards: 57%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 4/7 [00:03<00:02, 1.25it/s] Loading checkpoint shards: 71%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 5/7 [00:04<00:01, 1.19it/s] 一直卡着不动,GPU没见显存提升,CPU也不高。

dosubot[bot] commented 5 months ago

It looks like the model loading process for chatglm3-6b is getting stuck at 71%, which could be due to a variety of reasons. Here are some targeted steps to troubleshoot and potentially resolve the issue:

  1. Ensure Adequate Hardware Resources: Verify that your system has enough RAM and GPU memory to handle the chatglm3-6b model. Large models require significant resources to load.

  2. Debugging and Logging: Implement logging around the model loading process, especially before and after awaitable calls or process spawns, to identify where the process gets stuck. This can help pinpoint deadlocks or synchronization issues related to asyncio and multiprocessing.

  3. External Dependencies and Network Issues: If the model loading depends on external resources, ensure there are no network issues or downtimes affecting the process. Implementing timeout handling and retry mechanisms for external calls can be beneficial.

  4. Update Dependencies: Make sure all related dependencies, such as fastchat, langchain_core, and any model-specific libraries, are up to date. Sometimes, bugs in these libraries can cause model loading issues.

  5. Review Model Worker Configuration: Check the configuration for chatglm3-6b, including model_path, device, and num_gpus, to ensure it's correct and optimized for your hardware setup.

  6. Check Logs for Errors: Review the application logs, especially around the model loading phase, for any errors or warnings that could indicate the cause of the issue.

  7. Memory Management and Environment Variables: Review and optimize settings related to memory usage, parallelism, and environment variables like NUMEXPR_MAX_THREADS to match your system's capabilities.

If after trying these steps the issue persists, it might be helpful to provide more detailed logs or error messages for further diagnosis.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
zRzRzRzRzRzRzR commented 5 months ago

确定一下内存够吗, 16G空闲以上