一、系统环境 1.1、操作系统：centos9 1.2、python版本：3.10 1.3、[Langchain-Chatchat]版本：0.3.1.3 1.4、emb:bge-large-zh v1.5 1.5、llm: glm-4-9b-chat

二、Xinference： 2.1、创建Xinference虚拟环境（python3 -m venv venv_xinference） 2.2、pip安装，如下： Package Version

accelerate 1.1.0 aiofiles 23.2.1 aioprometheus 23.12.0 annotated-types 0.7.0 anyio 4.6.2.post1 async-timeout 5.0.0 bcrypt 4.2.0 certifi 2024.8.30 cffi 1.17.1 charset-normalizer 3.4.0 click 8.1.7 cloudpickle 3.1.0 cryptography 43.0.3 distro 1.9.0 ecdsa 0.19.0 exceptiongroup 1.2.2 fastapi 0.115.4 ffmpy 0.4.0 filelock 3.16.1 fsspec 2024.10.0 gradio 5.4.0 gradio_client 1.4.2 h11 0.14.0 httpcore 1.0.6 httpx 0.27.2 huggingface-hub 0.26.2 idna 3.10 Jinja2 3.1.4 jiter 0.7.0 joblib 1.4.2 markdown-it-py 3.0.0 MarkupSafe 2.1.5 mdurl 0.1.2 modelscope 1.19.2 mpmath 1.3.0 networkx 3.4.2 numpy 2.1.3 nvidia-cublas-cu12 12.4.5.8 nvidia-cuda-cupti-cu12 12.4.127 nvidia-cuda-nvrtc-cu12 12.4.127 nvidia-cuda-runtime-cu12 12.4.127 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.2.1.3 nvidia-curand-cu12 10.3.5.147 nvidia-cusolver-cu12 11.6.1.9 nvidia-cusparse-cu12 12.3.1.170 nvidia-ml-py 12.560.30 nvidia-nccl-cu12 2.21.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.4.127 openai 1.53.0 orjson 3.10.11 packaging 24.1 pandas 2.2.3 passlib 1.7.4 peft 0.13.2 pillow 11.0.0 pip 24.3.1 psutil 6.1.0 pyasn1 0.6.1 pycparser 2.22 pydantic 2.9.2 pydantic_core 2.23.4 pydub 0.25.1 Pygments 2.18.0 python-dateutil 2.9.0.post0 python-jose 3.3.0 python-multipart 0.0.12 pytz 2024.2 PyYAML 6.0.2 quantile-python 1.1 regex 2024.9.11 requests 2.32.3 rich 13.9.4 rsa 4.9 ruff 0.7.2 safehttpx 0.1.1 safetensors 0.4.5 scikit-learn 1.5.2 scipy 1.14.1 semantic-version 2.10.0 sentence-transformers 3.2.1 setuptools 65.5.0 shellingham 1.5.4 six 1.16.0 sniffio 1.3.1 sse-starlette 2.1.3 starlette 0.41.2 sympy 1.13.1 tabulate 0.9.0 tblib 3.0.0 threadpoolctl 3.5.0 tiktoken 0.8.0 timm 1.0.11 tokenizers 0.20.2 tomlkit 0.12.0 torch 2.5.1 torchvision 0.20.1 tqdm 4.66.6 transformers 4.46.1 triton 3.1.0 typer 0.12.5 typing_extensions 4.12.2 tzdata 2024.2 urllib3 2.2.3 uvicorn 0.32.0 uvloop 0.21.0 websockets 12.0 xinference 0.16.2 xoscar 0.4.0

2.3、启动运行Xinference： Xinference正常使用，使用xinfernce内置对话功能，能够正常进行对话;.

三、[Langchain-Chatchat] 3.1、创建[Langchain-Chatchat]虚拟环境（python3 -m venv venv_Langchain) 3.2、pip安装，如下： Package Version

aiohappyeyeballs 2.4.3 aiohttp 3.10.10 aiosignal 1.3.1 altair 4.2.2 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.6.2.post1 async-timeout 4.0.3 attrs 24.2.0 backoff 2.2.1 beautifulsoup4 4.12.3 blinker 1.8.2 Brotli 1.1.0 cachetools 5.5.0 certifi 2024.8.30 cffi 1.17.1 chardet 5.2.0 charset-normalizer 3.4.0 click 8.1.7 coloredlogs 15.0.1 contourpy 1.3.0 cryptography 43.0.3 cycler 0.12.1 dataclasses-json 0.6.7 deepdiff 8.0.1 Deprecated 1.2.14 deprecation 2.1.0 distro 1.9.0 effdet 0.4.1 emoji 2.14.0 entrypoints 0.4 et_xmlfile 2.0.0 exceptiongroup 1.2.2 faiss-cpu 1.7.4 faiss-gpu 1.7.2 Faker 30.8.2 fastapi 0.109.2 favicon 0.7.0 filelock 3.16.1 filetype 1.2.0 flatbuffers 24.3.25 fonttools 4.54.1 frozenlist 1.5.0 fsspec 2024.10.0 gitdb 4.0.11 GitPython 3.1.43 greenlet 3.1.1 h11 0.14.0 h2 4.1.0 hpack 4.0.0 htbuilder 0.6.2 httpcore 1.0.6 httpx 0.27.2 huggingface-hub 0.26.2 humanfriendly 10.0 hyperframe 6.0.1 idna 3.10 iopath 0.1.10 jieba 0.42.1 Jinja2 3.1.4 jiter 0.7.0 joblib 1.4.2 jsonpatch 1.33 jsonpath-python 1.0.6 jsonpointer 3.0.0 jsonschema 4.23.0 jsonschema-specifications 2024.10.1 kiwisolver 1.4.7 langchain 0.1.17 langchain-chatchat 0.3.1.3 langchain-community 0.0.36 langchain-core 0.1.53 langchain-experimental 0.0.58 langchain-openai 0.0.6 langchain-text-splitters 0.0.2 langchainhub 0.1.14 langdetect 1.0.9 langsmith 0.1.139 layoutparser 0.3.4 loguru 0.7.2 lxml 5.3.0 Markdown 3.7 markdown-it-py 3.0.0 markdownify 0.13.1 markdownlit 0.0.7 MarkupSafe 3.0.2 marshmallow 3.23.1 matplotlib 3.9.2 mdurl 0.1.2 memoization 0.4.0 more-itertools 10.5.0 mpmath 1.3.0 multidict 6.1.0 mypy-extensions 1.0.0 nest-asyncio 1.6.0 networkx 3.1 nltk 3.8.1 numexpr 2.10.1 numpy 1.24.4 nvidia-cublas-cu12 12.4.5.8 nvidia-cuda-cupti-cu12 12.4.127 nvidia-cuda-nvrtc-cu12 12.4.127 nvidia-cuda-runtime-cu12 12.4.127 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.2.1.3 nvidia-curand-cu12 10.3.5.147 nvidia-cusolver-cu12 11.6.1.9 nvidia-cusparse-cu12 12.3.1.170 nvidia-nccl-cu12 2.21.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.4.127 omegaconf 2.3.0 onnx 1.17.0 onnxruntime 1.15.1 openai 1.53.0 opencv-python 4.10.0.84 openpyxl 3.1.4 orderly-set 5.2.2 orjson 3.10.11 packaging 23.2 pandas 1.5.3 pathlib 1.0.1 pdf2image 1.17.0 pdfminer.six 20231228 pdfplumber 0.11.4 pikepdf 9.4.0 pillow 10.4.0 pip 24.3.1 portalocker 2.10.1 prometheus_client 0.21.0 propcache 0.2.0 protobuf 4.25.5 pyarrow 18.0.0 pyclipper 1.3.0.post6 pycocotools 2.0.8 pycparser 2.22 pydantic 2.7.4 pydantic_core 2.18.4 pydantic-settings 2.3.4 pydeck 0.9.1 Pygments 2.18.0 PyJWT 2.8.0 pymdown-extensions 10.12 PyMuPDF 1.23.26 PyMuPDFb 1.23.22 PyMySQL 1.1.1 pyparsing 3.2.0 pypdf 5.1.0 pypdfium2 4.30.0 pytesseract 0.3.13 python-dateutil 2.9.0.post0 python-decouple 3.8 python-docx 1.1.2 python-dotenv 1.0.1 python-iso639 2024.10.22 python-magic 0.4.27 python-multipart 0.0.9 pytz 2024.2 PyYAML 6.0.2 rank-bm25 0.2.2 RapidFuzz 3.10.1 rapidocr-onnxruntime 1.3.25 referencing 0.35.1 regex 2024.9.11 requests 2.31.0 requests-toolbelt 1.0.0 rich 13.9.4 rpds-py 0.20.1 ruamel.yaml 0.18.6 ruamel.yaml.clib 0.2.12 safetensors 0.4.5 scipy 1.14.1 setuptools 65.5.0 shapely 2.0.6 simplejson 3.19.3 six 1.16.0 smmap 5.0.1 sniffio 1.3.1 socksio 1.0.0 soupsieve 2.6 SQLAlchemy 2.0.36 sse-starlette 1.8.2 st-annotated-text 4.0.1 starlette 0.36.3 streamlit 1.34.0 streamlit-aggrid 1.0.5 streamlit-antd-components 0.3.1 streamlit-camera-input-live 0.2.0 streamlit-card 1.0.2 streamlit-chatbox 1.1.12.post4 streamlit-embedcode 0.1.2 streamlit-extras 0.4.2 streamlit-faker 0.0.3 streamlit-feedback 0.1.3 streamlit-image-coordinates 0.1.9 streamlit-keyup 0.2.4 streamlit_modal 0.1.0 streamlit-option-menu 0.3.12 streamlit-paste-button 0.1.2 streamlit-toggle-switch 1.0.2 streamlit-vertical-slider 2.5.5 strsimpy 0.2.1 sympy 1.13.1 tabulate 0.9.0 tenacity 8.5.0 tiktoken 0.8.0 timm 1.0.11 tokenizers 0.20.2 toml 0.10.2 toolz 1.0.0 torch 2.5.1 torchvision 0.20.1 tornado 6.4.1 tqdm 4.66.6 transformers 4.46.1 triton 3.1.0 types-requests 2.32.0.20241016 typing_extensions 4.12.2 typing-inspect 0.9.0 unstructured 0.11.8 unstructured-client 0.25.9 unstructured-inference 0.7.18 unstructured.pytesseract 0.3.13 urllib3 2.2.3 uvicorn 0.32.0 validators 0.34.0 watchdog 6.0.0 websockets 13.1 wrapt 1.16.0 xinference-client 0.13.3 yarl 1.17.1

3.3、model_settings.yaml

模型配置项

默认选用的 LLM 名称

DEFAULT_LLM_MODEL: autodl-tmp-glm-4-9b-chat-id

默认选用的 Embedding 名称

DEFAULT_EMBEDDING_MODEL: bge-large-zh-v1.5

AgentLM模型的名称 (可以不指定，指定之后就锁定进入Agent之后的Chain的模型，不指定就是 DEFAULT_LLM_MODEL)

Agent_MODEL: ''

默认历史对话轮数

HISTORY_LEN: 3

大模型最长支持的长度，如果不填写，则使用模型默认的最大长度，如果填写，则为用户设定的最大长度

MAX_TOKENS:

LLM通用对话参数

TEMPERATURE: 0.7

支持的Agent模型

SUPPORT_AGENT_MODELS:

chatglm3-6b
glm-4
openai-api
Qwen-2
qwen2-instruct
gpt-3.5-turbo
gpt-4o

LLM模型配置，包括了不同模态初始化参数。

`model` 如果留空则自动使用 DEFAULT_LLM_MODEL

LLM_MODEL_CONFIG: preprocess_model: model: '' temperature: 0.05 max_tokens: 4096 history_len: 10 prompt_name: default callbacks: false llm_model: model: '' temperature: 0.9 max_tokens: 4096 history_len: 10 prompt_name: default callbacks: true action_model: model: '' temperature: 0.01 max_tokens: 4096 history_len: 10 prompt_name: ChatGLM3 callbacks: true postprocess_model: model: '' temperature: 0.01 max_tokens: 4096 history_len: 10 prompt_name: default callbacks: true image_model: model: sd-turbo size: 256*256

模型加载平台配置

平台名称

platform_name: xinference

平台类型

可选值：['xinference', 'ollama', 'oneapi', 'fastchat', 'openai', 'custom openai']

platform_type: xinference

openai api url

api_base_url: http://127.0.0.1:9997/v1

api key if available

api_key: EMPTY

API 代理

api_proxy: ''

该平台单模型最大并发数

api_concurrencies: 5

是否自动获取平台可用模型列表。设为 True 时下方不同模型类型可自动检测

auto_detect_model: false

该平台支持的大语言模型列表，auto_detect_model 设为 True 时自动检测

llm_models: []

该平台支持的嵌入模型列表，auto_detect_model 设为 True 时自动检测

embed_models: []

该平台支持的图像生成模型列表，auto_detect_model 设为 True 时自动检测

text2image_models: []

该平台支持的多模态模型列表，auto_detect_model 设为 True 时自动检测

image2text_models: []

该平台支持的重排模型列表，auto_detect_model 设为 True 时自动检测

rerank_models: []

该平台支持的 STT 模型列表，auto_detect_model 设为 True 时自动检测

speech2text_models: []

该平台支持的 TTS 模型列表，auto_detect_model 设为 True 时自动检测

text2speech_models: []

MODEL_PLATFORMS:

platform_name: xinference platform_type: xinference api_base_url: http://127.0.0.1:9997/v1 api_key: EMPTY api_proxy: '' api_concurrencies: 5 auto_detect_model: true llm_models: [] embed_models: [] text2image_models: [] image2text_models: [] rerank_models: [] speech2text_models: [] text2speech_models: []
platform_name: ollama platform_type: ollama api_base_url: http://127.0.0.1:11434/v1 api_key: EMPTY api_proxy: '' api_concurrencies: 5 auto_detect_model: false llm_models:
- qwen:7b
- qwen2:7b embed_models:
- quentinz/bge-large-zh-v1.5 text2image_models: [] image2text_models: [] rerank_models: [] speech2text_models: [] text2speech_models: []
platform_name: oneapi platform_type: oneapi api_base_url: http://127.0.0.1:3000/v1 api_key: sk- api_proxy: '' api_concurrencies: 5 auto_detect_model: false llm_models:
- chatglm_pro
- chatglm_turbo
- chatglm_std
- chatglm_lite
- qwen-turbo
- qwen-plus
- qwen-max
- qwen-max-longcontext
- ERNIE-Bot
- ERNIE-Bot-turbo
- ERNIE-Bot-4
- SparkDesk embed_models:
- text-embedding-v1
- Embedding-V1 text2image_models: [] image2text_models: [] rerank_models: [] speech2text_models: [] text2speech_models: []
platform_name: openai platform_type: openai api_base_url: https://api.openai.com/v1 api_key: sk-proj- api_proxy: '' api_concurrencies: 5 auto_detect_model: false llm_models:
- gpt-4o
- gpt-3.5-turbo embed_models:
- text-embedding-3-small
- text-embedding-3-large text2image_models: [] image2text_models: [] rerank_models: [] speech2text_models: [] text2speech_models: []

3.4、启动：正常

3.5、使用：报错: 多功能对话报错： An error occurred during streaming 2024-11-05 14:22:50.570 | ERROR | chatchat.server.api_server.openai_routes:generator:105 - openai request error: An error occurred during streaming

RAG对话报错： failed to access embed model 'quentinz/bge-large-zh-v1.5': Error raised by inference endpoint: HTTPConnectionPool(host='127.0.0.1', port=11434): Max retries exceeded with url: /api/embeddings (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f31a39f7460>: Failed to establish a new connection: [Errno 111] Connection refused'))

chatchat-space / Langchain-Chatchat

An error occurred during streaming #5054

模型配置项

默认选用的 LLM 名称

默认选用的 Embedding 名称

AgentLM模型的名称 (可以不指定，指定之后就锁定进入Agent之后的Chain的模型，不指定就是 DEFAULT_LLM_MODEL)

默认历史对话轮数

大模型最长支持的长度，如果不填写，则使用模型默认的最大长度，如果填写，则为用户设定的最大长度

LLM通用对话参数

支持的Agent模型

LLM模型配置，包括了不同模态初始化参数。

`model` 如果留空则自动使用 DEFAULT_LLM_MODEL

模型加载平台配置

平台名称

平台类型

可选值：['xinference', 'ollama', 'oneapi', 'fastchat', 'openai', 'custom openai']

openai api url

api_base_url: http://127.0.0.1:9997/v1

api key if available

api_key: EMPTY

API 代理

api_proxy: ''

该平台单模型最大并发数

api_concurrencies: 5

是否自动获取平台可用模型列表。设为 True 时下方不同模型类型可自动检测

auto_detect_model: false

该平台支持的大语言模型列表，auto_detect_model 设为 True 时自动检测

llm_models: []

该平台支持的嵌入模型列表，auto_detect_model 设为 True 时自动检测

embed_models: []

该平台支持的图像生成模型列表，auto_detect_model 设为 True 时自动检测

text2image_models: []

该平台支持的多模态模型列表，auto_detect_model 设为 True 时自动检测

image2text_models: []

该平台支持的重排模型列表，auto_detect_model 设为 True 时自动检测

rerank_models: []

该平台支持的 STT 模型列表，auto_detect_model 设为 True 时自动检测

speech2text_models: []

该平台支持的 TTS 模型列表，auto_detect_model 设为 True 时自动检测

text2speech_models: []

chatchat-space / Langchain-Chatchat

An error occurred during streaming #5054

模型配置项

默认选用的 LLM 名称

默认选用的 Embedding 名称

AgentLM模型的名称 (可以不指定，指定之后就锁定进入Agent之后的Chain的模型，不指定就是 DEFAULT_LLM_MODEL)

默认历史对话轮数

大模型最长支持的长度，如果不填写，则使用模型默认的最大长度，如果填写，则为用户设定的最大长度

LLM通用对话参数

支持的Agent模型

LLM模型配置，包括了不同模态初始化参数。

model 如果留空则自动使用 DEFAULT_LLM_MODEL

模型加载平台配置

平台名称

平台类型

可选值：['xinference', 'ollama', 'oneapi', 'fastchat', 'openai', 'custom openai']

openai api url

api_base_url: http://127.0.0.1:9997/v1

api key if available

api_key: EMPTY

API 代理

api_proxy: ''

该平台单模型最大并发数

api_concurrencies: 5

是否自动获取平台可用模型列表。设为 True 时下方不同模型类型可自动检测

auto_detect_model: false

该平台支持的大语言模型列表，auto_detect_model 设为 True 时自动检测

llm_models: []

该平台支持的嵌入模型列表，auto_detect_model 设为 True 时自动检测

embed_models: []

该平台支持的图像生成模型列表，auto_detect_model 设为 True 时自动检测

text2image_models: []

该平台支持的多模态模型列表，auto_detect_model 设为 True 时自动检测

image2text_models: []

该平台支持的重排模型列表，auto_detect_model 设为 True 时自动检测

rerank_models: []

该平台支持的 STT 模型列表，auto_detect_model 设为 True 时自动检测

speech2text_models: []

该平台支持的 TTS 模型列表，auto_detect_model 设为 True 时自动检测

text2speech_models: []

`model` 如果留空则自动使用 DEFAULT_LLM_MODEL