chatchat-space / Langchain-Chatchat

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Apache License 2.0
31.66k stars 5.52k forks source link

[BUG] <title>ValueError: Tokenizer class Qwen2Tokenizer does not exist or is not currently imported. #3746

Closed zixiaotan21 closed 5 months ago

zixiaotan21 commented 6 months ago

问题描述 / Problem Description LLM模型选用Qwen-1.5-72B时,启动langchain-chatchat出现报错

复现问题的步骤 / Steps to Reproduce

  1. 执行 '...' / Run '...'
  2. 点击 '...' / Click '...'
  3. 滚动到 '...' / Scroll to '...'
  4. 问题出现 / Problem occurs

预期的结果 / Expected Result 无报错

实际结果 / Actual Result 2024-04-15 09:14:40,683 - startup.py[line:651] - INFO: 正在启动服务: 2024-04-15 09:14:40,684 - startup.py[line:652] - INFO: 如需查看 llm_api 日志,请前往 C:\ai\langchain\logs C:\ai\langchain\nltk_data NLTK_DATA_PATH C:\ai\langchain\nltk_data NLTK_DATA_PATH C:\ai\langchain\nltk_data NLTK_DATA_PATH 2024-04-15 09:14:48 | ERROR | stderr | INFO: Started server process [13400] 2024-04-15 09:14:48 | ERROR | stderr | INFO: Waiting for application startup. 2024-04-15 09:14:48 | ERROR | stderr | INFO: Application startup complete. 2024-04-15 09:14:48 | ERROR | stderr | INFO: Uvicorn running on http://192.168.210.11:20000 (Press CTRL+C to quit) 2024-04-15 09:14:49 | INFO | model_worker | Loading the model ['Qwen-72B-Chat'] on worker c192a079 ... 2024-04-15 09:14:49 | ERROR | stderr | Process model_worker - Qwen-72B-Chat: 2024-04-15 09:14:49 | ERROR | stderr | Traceback (most recent call last): 2024-04-15 09:14:49 | ERROR | stderr | File "C:\Users\fseport\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 314, in _bootstrap 2024-04-15 09:14:49 | ERROR | stderr | self.run() 2024-04-15 09:14:49 | ERROR | stderr | File "C:\Users\fseport\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 108, in run 2024-04-15 09:14:49 | ERROR | stderr | self._target(*self._args, self._kwargs) 2024-04-15 09:14:49 | ERROR | stderr | File "C:\ai\langchain\startup.py", line 387, in run_model_worker 2024-04-15 09:14:49 | ERROR | stderr | app = create_model_worker_app(log_level=log_level, kwargs) 2024-04-15 09:14:49 | ERROR | stderr | File "C:\ai\langchain\startup.py", line 215, in create_model_worker_app 2024-04-15 09:14:49 | ERROR | stderr | worker = ModelWorker( 2024-04-15 09:14:49 | ERROR | stderr | File "C:\Users\fseport\AppData\Local\Programs\Python\Python310\lib\site-packages\fastchat\serve\model_worker.py", line 77, in init 2024-04-15 09:14:49 | ERROR | stderr | self.model, self.tokenizer = load_model( 2024-04-15 09:14:49 | ERROR | stderr | File "C:\Users\fseport\AppData\Local\Programs\Python\Python310\lib\site-packages\fastchat\model\model_adapter.py", line 265, in load_model 2024-04-15 09:14:49 | ERROR | stderr | model, tokenizer = adapter.load_compress_model( 2024-04-15 09:14:49 | ERROR | stderr | File "C:\Users\fseport\AppData\Local\Programs\Python\Python310\lib\site-packages\fastchat\model\model_adapter.py", line 101, in load_compress_model 2024-04-15 09:14:49 | ERROR | stderr | return load_compress_model( 2024-04-15 09:14:49 | ERROR | stderr | File "C:\Users\fseport\AppData\Local\Programs\Python\Python310\lib\site-packages\fastchat\model\compression.py", line 113, in load_compress_model 2024-04-15 09:14:49 | ERROR | stderr | tokenizer = AutoTokenizer.from_pretrained( 2024-04-15 09:14:49 | ERROR | stderr | File "C:\Users\fseport\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 784, in from_pretrained 2024-04-15 09:14:49 | ERROR | stderr | raise ValueError( 2024-04-15 09:14:49 | ERROR | stderr | ValueError: Tokenizer class Qwen2Tokenizer does not exist or is not currently imported.

环境信息 / Environment Information langchain-ChatGLM 版本/commit 号:v0.2.9 是否使用 Docker 部署(是/否):否 使用的模型(ChatGLM2-6B / Qwen-7B 等):Qwen-1.5-72B 使用的 Embedding 模型(moka-ai/m3e-base 等):bge-large-zh 使用的向量库类型 (faiss / milvus / pg_vector 等): faiss 操作系统及版本 / Operating system and version: :Windows-10-10.0.22631-SP0. Python 版本 / Python version: :3.10.8 其他相关环境信息 / Other relevant environment information:

dosubot[bot] commented 6 months ago

To address the issue where you're encountering a "ValueError: Tokenizer class Qwen2Tokenizer does not exist or is not currently imported" error when using the Qwen-1.5-72B model with langchain-chatchat, follow these steps:

  1. Verify Tokenizer Existence: Ensure there is a tokenizer named Qwen2Tokenizer designed for the Qwen-1.5-72B model. This might require checking the model's documentation or the source from which the model was obtained.

  2. Implementation and Importation: If Qwen2Tokenizer exists:

    • Confirm it's correctly implemented in your project or available in a used library.
    • Import Qwen2Tokenizer in the script where AutoTokenizer.from_pretrained is called. If it's part of an external library, ensure the library is installed and imported correctly.
  3. Direct Instantiation: If Qwen2Tokenizer is not recognized by AutoTokenizer.from_pretrained (likely because it's a custom tokenizer not registered with the transformers library), you may need to instantiate Qwen2Tokenizer directly with the appropriate model path or identifier.

  4. Custom Tokenizer Registration: For a custom tokenizer, register it with the transformers library to use it with AutoTokenizer.from_pretrained. This involves adding your tokenizer class to the library's tokenizer class mapping.

  5. Check Model Identifier: Ensure the model identifier "Qwen-1.5-72B" used with AutoTokenizer.from_pretrained is correct and matches the expected identifier by the transformers library or your custom implementation.

If these steps do not resolve the issue, further specific details about the Qwen2Tokenizer implementation or consulting the transformers library documentation may be necessary.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.