chatchat-space / Langchain-Chatchat

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Apache License 2.0
31.7k stars 5.53k forks source link

[BUG] UnicodeEncodeError: 'utf-8' codec can't encode characters in position 4-5: surrogates not allowed #2889

Closed ipv6next closed 8 months ago

ipv6next commented 8 months ago

问题描述 / Problem Description 用简洁明了的语言描述这个问题 / Describe the problem in a clear and concise manner.

复现问题的步骤 / Steps to Reproduce

  1. 执行 '...' / Run '...'
  2. 点击 '...' / Click '...'
  3. 滚动到 '...' / Scroll to '...'
  4. 问题出现 / Problem occurs

预期的结果 / Expected Result 描述应该出现的结果 / Describe the expected result.

实际结果 / Actual Result 描述实际发生的结果 / Describe the actual result.

环境信息 / Environment Information

附加信息 / Additional Information UnicodeEncodeError: 'utf-8' codec can't encode characters in position 4-5: surrogates not allowed 添加与问题相关的任何其他信息 / Add any other information related to the issue.

dosubot[bot] commented 8 months ago

🤖

你好,@ipv6next!👋 我是Dosu,一位乐于助人的机器人,专门在这里帮助解决问题、回答问题,并帮助你成为一个贡献者。虽然我在等待人类维护者的同时会在这里为你提供帮助,但我会立刻开始查看你的问题,并很快给你一个完整的答案。🔍😊

ipv6next commented 8 months ago

Traceback (most recent call last): File "/opt/mydata/miniconda3/envs/int/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script exec(code, module.dict) File "/opt/mydata/chatchat/Chatchat-0.2.9/webui.py", line 64, in pages[selected_page]["func"](api=api, is_lite=is_lite) File "/opt/mydata/chatchat/Chatchat-0.2.9/webui_pages/knowledge_base/knowledge_base.py", line 222, in knowledge_base_page doc_grid = AgGrid( ^^^^^^^ File "/opt/mydata/miniconda3/envs/int/lib/python3.11/site-packages/st_aggrid/init.py", line 356, in AgGrid row_data = __parse_row_data(data) ^^^^^^^^^^^^^^^^^^^^^^ File "/opt/mydata/miniconda3/envs/int/lib/python3.11/site-packages/st_aggrid/init.py", line 54, in __parse_row_data row_data = data_parameter.to_json(orient='records', date_format='iso') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/mydata/miniconda3/envs/int/lib/python3.11/site-packages/pandas/core/generic.py", line 2532, in to_json return json.to_json( ^^^^^^^^^^^^^ File "/opt/mydata/miniconda3/envs/int/lib/python3.11/site-packages/pandas/io/json/_json.py", line 191, in to_json ).write() ^^^^^^^ File "/opt/mydata/miniconda3/envs/int/lib/python3.11/site-packages/pandas/io/json/_json.py", line 244, in write return dumps( ^^^^^^ UnicodeEncodeError: 'utf-8' codec can't encode characters in position 4-5: surrogates not allowed

dosubot[bot] commented 8 months ago

🤖

根据您提供的错误堆栈,问题似乎出现在尝试将数据转换为 JSON 格式时。具体来说,pandasto_json 方法在尝试编码某些字符时遇到了问题。这可能是因为您的数据包含了一些 UTF-8 无法编码的字符。

一个可能的解决方案是在将数据传递给 AgGrid 函数之前,先对数据进行清理,确保所有的字符都是 UTF-8 可编码的。这可以通过 pandasapplymap 方法来实现,示例如下:

import pandas as pd

def clean_data(data):
    return data.applymap(lambda x: x.encode('utf-8', 'ignore').decode('utf-8'))

cleaned_data = clean_data(your_data)

在这个示例中,clean_data 函数会遍历数据中的每一个元素,尝试将其编码为 UTF-8 格式,如果遇到无法编码的字符,则会忽略它。然后,再将编码后的数据解码回 UTF-8 格式。这样,您就可以确保传递给 AgGrid 的数据中,所有的字符都是 UTF-8 可编码的。

然后,您可以将清理后的数据传递给 AgGrid 函数:

doc_grid = AgGrid(
    cleaned_data,
    gb.build(),
    columns_auto_size_mode="FIT_CONTENTS",
    theme="alpine",
    custom_css={
        "#gridToolBar": {"display": "none"},
    },
    allow_unsafe_jscode=True,
    enable_enterprise_modules=False
)

希望这个解决方案能帮助您解决问题。如果问题仍然存在,或者您有其他的问题,欢迎随时向我提问。

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

zRzRzRzRzRzRzR commented 8 months ago

编码问题,查看过往的issue