Closed xuzhenjun130 closed 5 months ago
Hello @xuzhenjun130! I'm Dosu, a bot here to lend a hand with solving bugs, answering questions, and guiding you on your journey to becoming a contributor. While you're waiting for a human maintainer, I'm here to assist. I'm diving into your issue now and will get back to you with a full answer shortly.
what's your transformers version
Hey there @xuzhenjun130! Good to see you diving into new challenges. Hope this one hasn't been too much of a headache for you yet.
The KeyError: 'qwen2'
error you're encountering is likely due to the Langchain-Chatchat application trying to access a key named 'qwen2' in a dictionary (or similar data structure) that does not exist. This error is not directly visible in the provided code snippet, which means the issue might be related to the changes made in the configuration file or how those changes are being accessed or applied within the application.
In Python, a KeyError
is raised when trying to access a dictionary key that does not exist. If you've modified the configuration file to include or modify the handling of a model or parameter named 'qwen2', ensure that:
Since the provided code does not directly reference 'qwen2', the error is likely occurring in a part of the code not shown or in the interaction with the configuration file. Double-check the changes made to the configuration file and how those changes are reflected in the code.
The Langchain-Chatchat application handles the loading of models, including the Qwen/Qwen1.5-7B-Chat model, through a dynamic and configurable approach that supports both online API-based models and offline models, including those supported directly by Langchain. The process is managed by the create_model_worker_app
function, which takes various parameters to configure the model worker appropriately based on the model type (Langchain model, online API, or offline model).
For offline models, the application supports loading models from a Hugging Face repository ID or a local path, and it can run on specified devices. The configuration includes specifying the model path, device, and other parameters like the number of GPUs, memory utilization, and concurrency limits. The application uses the VLLMWorker
for models that are compatible with the VLLM (Very Large Language Model) framework, allowing for advanced configurations such as tensor parallel size, block size, and GPU memory utilization.
For models not directly supported by Langchain or those requiring online APIs, the application allows specifying a worker class that handles the interaction with the model or API. This flexibility enables the Langchain-Chatchat application to support a wide range of models and APIs by abstracting the model loading and interaction logic into configurable worker classes.
The application also includes provisions for model quantization and optimization, such as support for 8-bit loading and adaptive weight quantization (AWQ), through the GptqConfig
and AWQConfig
configurations. These features are aimed at improving performance and reducing memory consumption, which is crucial for deploying large models like Qwen/Qwen1.5-7B-Chat efficiently.
In summary, the Langchain-Chatchat application's approach to model loading is highly flexible and configurable, supporting a wide range of model types and configurations, including the Qwen/Qwen1.5-7B-Chat model, through a combination of offline model support, online API integration, and advanced optimization techniques.
You can find more details in the startup.py file in the Langchain-Chatchat repository.
Regarding the 'qwen2' key, it does not appear directly in the provided code snippets from the Langchain-Chatchat codebase. Therefore, it's not possible to determine its specific role based on the given context. It might be helpful to search the entire codebase for references to 'qwen2' to understand its role and why it might be causing a KeyError.
Finally, it's not clear from the provided context how the Langchain-Chatchat application handles KeyError exceptions during the model loading process. It's possible that the application has a general error handling mechanism that catches and logs exceptions, but without more specific information or context, it's difficult to say for sure. It might be helpful to review the application's error handling and logging code to understand how it handles exceptions like KeyError during the model loading process.
I hope this helps! If you have any more questions or need further clarification, feel free to ask.
what's your transformers version
pip show transformers Name: transformers Version: 4.35.0
更新transformer版本,docker估计没更新,
what's your transformers version
pip show transformers Name: transformers Version: 4.35.0
try transformers==4.37.2
Requirements The code of Qwen1.5 has been in the latest Hugging face transformers and we advise you to install transformers>=4.37.0, or you might encounter the following error:
KeyError: 'qwen2'
细节见:https://huggingface.co/Qwen/Qwen1.5-7B-Chat/blob/main/README.md
问题描述 / Problem Description
docker run -v /home/ubuntu/custom_models/qwen-7b-chat:/Qwen/Qwen-7B-Chat -d --gpus all -p 80:8501 registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.7
模型已经提前下载好了 https://huggingface.co/Qwen/Qwen1.5-7B-Chat
配置文件:
只是修改了两处地方:
LLM_MODELS = ["chatglm2-6b", "zhipu-api", "openai-api", "Qwen-7B-Chat"]
"Qwen-7B-Chat": "/Qwen/Qwen-7B-Chat",
复现问题的步骤 / Steps to Reproduce 保存配置,重启docker 容器
发现日志报错:
预期的结果 / Expected Result 正常加载通义千问模型
实际结果 / Actual Result
无法启动
环境信息 / Environment Information ubuntu ,RTX 4090 内存32G