chatchat-space / Langchain-Chatchat

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Apache License 2.0
31.2k stars 5.45k forks source link

知识库问答—LLM模型求推荐 #3881

Closed zixiaotan21 closed 4 months ago

zixiaotan21 commented 4 months ago

问题描述 / Problem Description 进行知识库问答时,本地部署qwen-1.5-7B-chat, qwen-1.5-14B-chat,回答效果不好,使用qwen-1.5-72B-chat 运行又太慢了。

想请教一下各位大佬,有好用的LLM模型推荐不?

dosubot[bot] commented 4 months ago

对于知识库问答的LLM模型选择,除了你已尝试的qwen系列外,还有多种模型可供选择,这些模型可能会在性能和运行速度上更适合你的需求。推荐尝试以下几个模型:

为了提高模型的运行速度,可以考虑优化配置设置。例如,通过设置LLM_DEVICE为"auto"或"cuda"(如果有NVIDIA GPU),可以让模型在最佳硬件上运行。此外,调整MAX_TOKENS参数可以根据你的具体需求和硬件能力来优化性能,减少token数量可以加快响应时间。

希望这些建议能帮助你找到适合你需求的LLM模型。

#### Details **Sources** - [[BUG] 如何使用chatchat正确配置opanai LLM models](https://github.com/chatchat-space/Langchain-Chatchat/issues/2418) - [关于如何提高知识库问答的答复质量和准确度,请各位大神有什么好的经验分享一下吧。](https://github.com/chatchat-space/Langchain-Chatchat/issues/844) - [你好,请教个问题,目前问答回复需要20秒左右,如何提高速度?V10032G服务器。](https://github.com/chatchat-space/Langchain-Chatchat/issues/493) - [[FEATURE] 建议更换默认模型为qwen-chat-7b,以支持更大的上下文token数](https://github.com/chatchat-space/Langchain-Chatchat/issues/1184) - [configs/model_config.py.example](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/configs/model_config.py.example)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

zixiaotan21 commented 4 months ago

@dosu

MAX_TOKENS参数如何设置

dosubot[bot] commented 4 months ago

To set the MAX_TOKENS parameter effectively, balancing between response time and answer quality, consider the following:

  1. Model and Hardware Limits: Choose a MAX_TOKENS value that your hardware can handle efficiently, especially if using larger models like qwen-1.5-14B-chat. Smaller models or better hardware allow for higher MAX_TOKENS.

  2. Response Time vs. Quality:

    • For quicker responses, reduce MAX_TOKENS. This decreases computation but may limit response depth.
    • For more detailed answers, increase MAX_TOKENS. This requires more processing time.
  3. Optimization Techniques: Use model optimization strategies like quantization and pruning to improve processing times without greatly affecting quality.

  4. Dynamic Adjustment: Implement logic to adjust MAX_TOKENS based on query complexity or system load, optimizing for performance or detail as needed.

  5. Monitoring and Adjustments: Continuously monitor performance and adjust MAX_TOKENS based on feedback to find the optimal balance for your application.

Start with a MAX_TOKENS value that aligns with your hardware capabilities and adjust based on performance observations and application needs.

#### Details **Sources** - [configs/model_config.py.example](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/configs/model_config.py.example) - [server/chat/agent_chat.py](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/server/chat/agent_chat.py) - [server/chat/completion.py](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/server/chat/completion.py) - [server/chat/chat.py](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/server/chat/chat.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Zephyr69 commented 4 months ago

这大概率不是模型的问题,是你知识库组成和配置的问题

zixiaotan21 commented 4 months ago

这大概率不是模型的问题,是你知识库组成和配置的问题

  1. 我的知识库是下面这样的:
# 问题1

## 问题
XXX

## 回答
XXX

#问题2

## 问题
XXX

## 回答
XXX

text splitter 算法用MarkdownHeaderTextsplitter,可以将不同的问答对,QA1,QA2切分为不同的向量。

实际提问问题1的时候,也确实能将答案1 检索出来,但回答的时候,没有完全按照答案1来回答,会加入一些编造的内容,这是我的模型推理能力不足吗?

  1. 配置的话,我是用4090显卡去跑的,现在用的模型是qwen-1.5-14B-chat 。
Zephyr69 commented 4 months ago

这大概率不是模型的问题,是你知识库组成和配置的问题

  1. 我的知识库是下面这样的:
# 问题1

## 问题
XXX

## 回答
XXX

#问题2

## 问题
XXX

## 回答
XXX

text splitter 算法用MarkdownHeaderTextsplitter,可以将不同的问答对,QA1,QA2切分为不同的向量。

实际提问问题1的时候,也确实能将答案1 检索出来,但回答的时候,没有完全按照答案1来回答,会加入一些编造的内容,这是我的模型推理能力不足吗?

  1. 配置的话,我是用4090显卡去跑的,现在用的模型是qwen-1.5-14B-chat 。

模型会添加自主成分的话,可以把这种模式下的温度调低,0.01或0.1之类的。

zixiaotan21 commented 4 months ago

这大概率不是模型的问题,是你知识库组成和配置的问题

  1. 我的知识库是下面这样的:
# 问题1

## 问题
XXX

## 回答
XXX

#问题2

## 问题
XXX

## 回答
XXX

text splitter 算法用MarkdownHeaderTextsplitter,可以将不同的问答对,QA1,QA2切分为不同的向量。 实际提问问题1的时候,也确实能将答案1 检索出来,但回答的时候,没有完全按照答案1来回答,会加入一些编造的内容,这是我的模型推理能力不足吗?

  1. 配置的话,我是用4090显卡去跑的,现在用的模型是qwen-1.5-14B-chat 。

模型会添加自主成分的话,可以把这种模式下的温度调低,0.01或0.1之类的。

谢谢建议!确实将temperature调低,效果会稍微好一点~