[Bug] [Module Name] model response parase faild！

Search before asking

[X] I had searched in the issues and found no similar issues.

Operating system information

Windows

Python version information

=3.11

DB-GPT version

main

Related scenes

[ ] Chat Data
[ ] Chat Excel
[ ] Chat DB
[ ] Chat Knowledge
[ ] Model Management
[ ] Dashboard
[ ] Plugins

Installation Information

Device information

cpu

Models information

EMBEDDING_MODEL=m3e-large

What happened

User Question: 当前数据库有多少张表？ Please think step by step and respond according to the following JSON format: "{\n \"thoughts\": \"thoughts summary to say to user\",\n \"sql\": \"SQL Query to run\",\n \"display_type\": \"Data display method\"\n}" Ensure the response is correct json and can be parsed by Python json.loads.

Assistant:

stream output:

2023-12-25 16:07:16 DESKTOP-55MNTH6 dbgpt.model.cluster.worker.default_worker[21288] ERROR Model inference error, detail: Traceback (most recent call last): File "d:\github\db-gpt\dbgpt\model\cluster\worker\default_worker.py", line 154, in generate_stream for output in generate_stream_func( File "D:\ProgramData\envs\dbgpt_env\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context response = gen.send(None) File "D:\ProgramData\envs\dbgpt_env\lib\site-packages\fastchat\serve\inference.py", line 132, in generate_stream out = model(input_ids=start_ids, use_cache=True) File "D:\ProgramData\envs\dbgpt_env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "D:\ProgramData\envs\dbgpt_env\lib\site-packages\transformers\models\bert\modeling_bert.py", line 1226, in forward outputs = self.bert( File "D:\ProgramData\envs\dbgpt_env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "D:\ProgramData\envs\dbgpt_env\lib\site-packages\transformers\models\bert\modeling_bert.py", line 979, in forward buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length) RuntimeError: The expanded size of the tensor (2133) must match the existing size (512) at non-singleton dimension 1. Target sizes: [1, 2133]. Tensor sizes: [1, 512]

Traceback (most recent call last): File "d:\github\db-gpt\dbgpt\app\scene\base_chat.py", line 257, in nostream_call self.prompt_template.output_parser.parse_model_nostream_resp( File "d:\github\db-gpt\dbgpt\core\interface\output_parser.py", line 91, in parse_model_nostream_resp raise ValueError( ValueError: Model server error!code=1, errmsg is LLMServer Generate Error, Please CheckErrorInfo.: The expanded size of the tensor (2133) must match the existing size (512) at non-singleton dimension 1. Target sizes: [1, 2133]. Tensor sizes: [1, 512]

2023-12-25 16:07:16 DESKTOP-55MNTH6 dbgpt.app.scene.base_chat[21288] ERROR model response parase faild！Model server error!code=1, errmsg is LLMServer Generate Error, Please CheckErrorInfo.: The expanded size of the tensor (2133) must match the existing size (512) at non-singleton dimension 1. Target sizes: [1, 2133]. Tensor sizes: [1, 512]

What you expected to happen

我是用的代理模型，从页面上可以看到，MySQL数据库已经连接，可以看到元数据。但当我提问时，反馈result的格式不对？

How to reproduce

这是我的env：
#*******************************************************************#
#**             DB-GPT  - GENERAL SETTINGS                        **#  
#*******************************************************************#
## DISABLED_COMMAND_CATEGORIES - The list of categories of commands that are disabled. Each of the below are an option:
## pilot.commands.query_execute

## For example, to disable coding related features, uncomment the next line
# DISABLED_COMMAND_CATEGORIES=   

#*******************************************************************#
#**                        Webserver Port                         **#
#*******************************************************************#
WEB_SERVER_PORT=7860

#*******************************************************************#
#***                       LLM PROVIDER                          ***#
#*******************************************************************#

# TEMPERATURE=0

#*******************************************************************#
#**                         LLM MODELS                            **#
#*******************************************************************#
# LLM_MODEL, see dbgpt/configs/model_config.LLM_MODEL_CONFIG
LLM_MODEL=m3e-large
# vicuna-13b-v1.5
## LLM model path, by default, DB-GPT will read the model path from LLM_MODEL_CONFIG based on the LLM_MODEL.
## Of course you can specify your model path according to LLM_MODEL_PATH
## In DB-GPT, the priority from high to low to read model path:
##    1. environment variable with key: {LLM_MODEL}_MODEL_PATH (Avoid multi-model conflicts)
##    2. environment variable with key: MODEL_PATH
##    3. environment variable with key: LLM_MODEL_PATH
##    4. the config in dbgpt/configs/model_config.LLM_MODEL_CONFIG
# LLM_MODEL_PATH=/app/models/vicuna-13b-v1.5
# LLM_PROMPT_TEMPLATE=vicuna_v1.1
MODEL_SERVER=http://127.0.0.1:8000
MODEL_PATH=D:\GitHub\DB-GPT\models\m3e-large
LIMIT_MODEL_CONCURRENCY=5
MAX_POSITION_EMBEDDINGS=4096
QUANTIZE_QLORA=True
QUANTIZE_8bit=True
# QUANTIZE_4bit=False
## SMART_LLM_MODEL - Smart language model (Default: vicuna-13b)
## FAST_LLM_MODEL - Fast language model (Default: chatglm-6b)
# SMART_LLM_MODEL=vicuna-13b
# FAST_LLM_MODEL=chatglm-6b
## Proxy llm backend, this configuration is only valid when "LLM_MODEL=proxyllm", When we use the rest API provided by deployment frameworks like fastchat as a proxyllm, 
## "PROXYLLM_BACKEND" is the model they actually deploy. We can use "PROXYLLM_BACKEND" to load the prompt of the corresponding scene. 
# PROXYLLM_BACKEND=

### You can configure parameters for a specific model with {model name}_{config key}=xxx
### See dbgpt/model/parameter.py
## prompt template for current model
# llama_cpp_prompt_template=vicuna_v1.1
## llama-2-70b must be 8
# llama_cpp_n_gqa=8
## Model path
# llama_cpp_model_path=/data/models/TheBloke/vicuna-13B-v1.5-GGUF/vicuna-13b-v1.5.Q4_K_M.gguf

### LLM cache
## Enable Model cache
# MODEL_CACHE_ENABLE=True
## The storage type of model cache, now supports: memory, disk
# MODEL_CACHE_STORAGE_TYPE=disk
## The max cache data in memory, we always store cache data in memory fist for high speed. 
# MODEL_CACHE_MAX_MEMORY_MB=256
## The dir to save cache data, this configuration is only valid when MODEL_CACHE_STORAGE_TYPE=disk
## The default dir is pilot/data/model_cache
# MODEL_CACHE_STORAGE_DISK_DIR=

#*******************************************************************#
#**                         EMBEDDING SETTINGS                    **#
#*******************************************************************#
#EMBEDDING_MODEL=text2vec
EMBEDDING_MODEL=m3e-large
#EMBEDDING_MODEL=bge-large-en
#EMBEDDING_MODEL=bge-large-zh
KNOWLEDGE_CHUNK_SIZE=500
KNOWLEDGE_SEARCH_TOP_SIZE=5
#KNOWLEDGE_CHUNK_OVERLAP=50
# Control whether to display the source document of knowledge on the front end.
KNOWLEDGE_CHAT_SHOW_RELATIONS=False
# Whether to enable Chat Knowledge Search Rewrite Mode
KNOWLEDGE_SEARCH_REWRITE=False
## EMBEDDING_TOKENIZER   - Tokenizer to use for chunking large inputs
## EMBEDDING_TOKEN_LIMIT - Chunk size limit for large inputs
# EMBEDDING_MODEL=all-MiniLM-L6-v2
# EMBEDDING_TOKENIZER=all-MiniLM-L6-v2
# EMBEDDING_TOKEN_LIMIT=8191

## Openai embedding model, See dbgpt/model/parameter.py
# EMBEDDING_MODEL=proxy_openai
# proxy_openai_proxy_server_url=https://api.openai.com/v1
# proxy_openai_proxy_api_key={your-openai-sk}
# proxy_openai_proxy_backend=text-embedding-ada-002

#*******************************************************************#
#**                  DB-GPT METADATA DATABASE SETTINGS            **#
#*******************************************************************#
### SQLite database (Current default database)
LOCAL_DB_TYPE=sqlite

### MYSQL database
# LOCAL_DB_TYPE=mysql
# LOCAL_DB_USER=root
# LOCAL_DB_PASSWORD={your_password}
# LOCAL_DB_HOST=127.0.0.1
# LOCAL_DB_PORT=3306
# LOCAL_DB_NAME=dbgpt
### This option determines the storage location of conversation records. The default is not configured to the old version of duckdb. It can be optionally db or file (if the value is db, the database configured by LOCAL_DB will be used)
#CHAT_HISTORY_STORE_TYPE=db

#*******************************************************************#
#**                         COMMANDS                              **#
#*******************************************************************#
EXECUTE_LOCAL_COMMANDS=False

#*******************************************************************#
#**                  ALLOWLISTED PLUGINS                          **#
#*******************************************************************#

#ALLOWLISTED_PLUGINS - Sets the listed plugins that are allowed (Example: plugin1,plugin2,plugin3)
#DENYLISTED_PLUGINS - Sets the listed plugins that are not allowed (Example: plugin1,plugin2,plugin3)
ALLOWLISTED_PLUGINS=
DENYLISTED_PLUGINS=

#*******************************************************************#
#**                 CHAT PLUGIN SETTINGS                          **#
#*******************************************************************#
# CHAT_MESSAGES_ENABLED - Enable chat messages (Default: False)
# CHAT_MESSAGES_ENABLED=False

#*******************************************************************#
#**                  VECTOR STORE SETTINGS                       **#
#*******************************************************************#
### Chroma vector db config
VECTOR_STORE_TYPE=Chroma
#CHROMA_PERSIST_PATH=/root/DB-GPT/pilot/data

### Milvus vector db config
#VECTOR_STORE_TYPE=Milvus
#MILVUS_URL=127.0.0.1
#MILVUS_PORT=19530
#MILVUS_USERNAME
#MILVUS_PASSWORD
#MILVUS_SECURE=

### Weaviate vector db config
#VECTOR_STORE_TYPE=Weaviate
#WEAVIATE_URL=https://kt-region-m8hcy0wc.weaviate.network

#*******************************************************************#
#**                  WebServer Language Support                   **#
#*******************************************************************#
#LANGUAGE=en
LANGUAGE=zh

#*******************************************************************#
# **    PROXY_SERVER (openai interface | chatGPT proxy service), use chatGPT as your LLM.
# ** if your server can visit openai, please set PROXY_SERVER_URL=https://api.openai.com/v1/chat/completions
# ** else if you have a chatgpt proxy server, you can set PROXY_SERVER_URL={your-proxy-serverip:port/xxx}
#*******************************************************************#
PROXY_API_KEY=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
PROXY_SERVER_URL=https://api.openai.com/v1/chat/completions

# from https://bard.google.com/     f12-> application-> __Secure-1PSID
BARD_PROXY_API_KEY={your-bard-token}

#*******************************************************************#
# **  PROXY_SERVER +                                              **#
#*******************************************************************#

# Aliyun tongyi
TONGYI_PROXY_API_KEY={your-tongyi-sk}

## Baidu wenxin
#WEN_XIN_MODEL_VERSION={version}
#WEN_XIN_API_KEY={your-wenxin-sk}
#WEN_XIN_API_SECRET={your-wenxin-sct}

## Zhipu
#ZHIPU_MODEL_VERSION={version}
#ZHIPU_PROXY_API_KEY={your-zhipu-sk}

## Baichuan
#BAICHUN_MODEL_NAME={version}
#BAICHUAN_PROXY_API_KEY={your-baichuan-sk}
#BAICHUAN_PROXY_API_SECRET={your-baichuan-sct}

# Xunfei Spark
#XUNFEI_SPARK_API_VERSION={version}
#XUNFEI_SPARK_APPID={your_app_id}
#XUNFEI_SPARK_API_KEY={your_api_key}
#XUNFEI_SPARK_API_SECRET={your_api_secret}

#*******************************************************************#
#**    SUMMARY_CONFIG                                             **#
#*******************************************************************#
SUMMARY_CONFIG=FAST

`#*******************************************************************#`
#**    MUlti-GPU                                                  **#
#*******************************************************************#
## See https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/
## If CUDA_VISIBLE_DEVICES is not configured, all available gpus will be used
# CUDA_VISIBLE_DEVICES=0
## You can configure the maximum memory used by each GPU.
# MAX_GPU_MEMORY=16Gib

#*******************************************************************#
#**                         LOG                                   **#
#*******************************************************************#
# FATAL, ERROR, WARNING, WARNING, INFO, DEBUG, NOTSET
DBGPT_LOG_LEVEL=INFO
# LOG dir, default: ./logs
#DBGPT_LOG_DIR=

Additional context

No response

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

eosphoros-ai / DB-GPT