往向量库上传文件报错：TypeError: string indices must be integers, not 'str'

b2383355038 commented 6 months ago

问题描述 / Problem Description 在知识库上传csv文件时发生报错

复现问题的步骤 / Steps to Reproduce

执行 'python startup -a'
点击 '知识库管理'
滚动到 '上传文件.'
问题出现 /报错 预期的结果 / Expected Result 成功的传入到faiss向量库中

实际结果 / Actual Result 报错信息：： 2024-05-07 09:45:16,936 - utils.py[line:95] - ERROR: ConnectError: error when post /knowledge_base/search_docs: [Errno 111] Connection refused 2024-05-07 09:45:16,936 - utils.py[line:95] - ERROR: ConnectError: error when post /knowledge_base/search_docs: [Errno 111] Connection refused 2024-05-07 09:45:16,937 - utils.py[line:95] - ERROR: ConnectError: error when post /knowledge_base/search_docs: [Errno 111] Connection refused 2024-05-07 09:45:16.937 Uncaught app exception Traceback (most recent call last): File "/root/miniconda3/envs/lc/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 535, in _run_script exec(code, module.dict) File "/mnt/Langchain-Chatchat/webui.py", line 64, in pages[selected_page]["func"](api=api, is_lite=is_lite) File "/mnt/Langchain-Chatchat/webui_pages/knowledge_base/knowledge_base.py", line 330, in knowledge_base_page data = [ ^ File "/mnt/Langchain-Chatchat/webui_pages/knowledge_base/knowledge_base.py", line 331, in {"seq": i + 1, "id": x["id"], "page_content": x["page_content"], "source": x["metadata"].get("source"), ~^^^^^^ TypeError: string indices must be integers, not 'str'

环境信息 / Environment Information

langchain-ChatGLM 版本/commit 号：v2.10
是否使用 Docker 部署（是/否）：否
使用的模型（ChatGLM2-6B / Qwen-7B 等）：文心一言4.0
使用的 Embedding 模型（moka-ai/m3e-base 等）：bge-large-zh
使用的向量库类型 (faiss / milvus / pg_vector 等)： faiss
操作系统及版本 / Operating system and version: 阿里云服务器
Python 版本 / Python version:3.10
其他相关环境信息 / Other relevant environment information:

附加信息 / Additional Information 添加与问题相关的任何其他信息 / Add any other information related to the issue.

dosubot[bot] commented 6 months ago

The error you're encountering, TypeError: string indices must be integers, not 'str', suggests that the application is attempting to access elements of a string using string keys, which is only valid for dictionaries or lists. This typically happens when the data structure expected is a list of dictionaries (where each dictionary represents a row from the CSV file, and keys are column headers), but instead, a string or incorrect format is being processed.

To resolve this, ensure your CSV file is correctly parsed into a list of dictionaries. You can use the pandas library to easily convert a CSV file into this format:

import pandas as pd

# Assuming 'csv_file' is the path to your CSV file
df = pd.read_csv(csv_file)
records = df.to_dict(orient='records')

# 'records' is now a list of dictionaries, each representing a row from the CSV.

After parsing the CSV file into the correct format, you can iterate over records and access each column value by its header name, which should align with the expected data structure in your application and prevent the TypeError.

If the issue persists after ensuring the CSV is correctly parsed and the data structure is as expected, it might be necessary to review the specific steps or code handling the upload process for further discrepancies.

#### Details

**Sources** - [[BUG] 知识库上传csv没有进入向量库](https://github.com/chatchat-space/Langchain-Chatchat/issues/2948) - [TypeError: string indices must be integers](https://github.com/chatchat-space/Langchain-Chatchat/issues/3186) - [webui_pages/knowledge_base/knowledge_base.py](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/webui_pages/knowledge_base/knowledge_base.py) - [document_loaders/FilteredCSVloader.py](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/document_loaders/FilteredCSVloader.py) - [server/chat/file_chat.py](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/server/chat/file_chat.py) - [server/api.py](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/server/api.py) - [requirements.txt](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/requirements.txt)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

zRzRzRzRzRzRzR commented 6 months ago

使用sample数据库能成功吗

b2383355038 commented 6 months ago

使用sample数据库能成功吗

我没有试，这个好像是接口直接就断开啦

zRzRzRzRzRzRzR commented 6 months ago

你试试本地命令行跑一下

b2383355038 commented 6 months ago

你试试本地命令行跑一下

我试过了不行一直报这个错

zRzRzRzRzRzRzR commented 6 months ago

你的内容是什么格式的

b2383355038 commented 6 months ago

csv格式的

zRzRzRzRzRzRzR commented 6 months ago

csv是qa两列吗，一般是两列

b2383355038 commented 6 months ago

不是就一列5万行，都是一些公司名称

zRzRzRzRzRzRzR commented 6 months ago

那应该户出现这个问题，一列没发做embed，csv是qa对

b2383355038 commented 6 months ago

但是我之前可以穿进去现在也是穿进去了几十个但现在一直传不进去

aben1900 commented 5 months ago

搞了一个开源，连基本的运行都做不到，唉，这些错误都是直接就能发现的呀，真是看不懂，我也是出了这个问题。一个128k的模型无限自问自答只能弃用，一个langchain加载个人库报这个错，感觉也完全用不起来

2024-05-27 16:32:22,181 - utils.py[line:95] - ERROR: ReadTimeout: error when post /knowledge_base/create_knowledge_base: timed out 2024-05-27 16:36:43,277 - utils.py[line:95] - ERROR: ReadTimeout: error when post /knowledge_base/search_docs: timed out 2024-05-27 16:36:43.277 Uncaught app exception Traceback (most recent call last): File "/root/langchain_pip/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 535, in _run_script exec(code, module.dict) File "/sdb/chatgpt/Langchain-Chatchat/webui.py", line 64, in pages[selected_page]["func"](api=api, is_lite=is_lite) File "/sdb/chatgpt/Langchain-Chatchat/webui_pages/knowledge_base/knowledge_base.py", line 330, in knowledge_base_page data = [ ^ File "/sdb/chatgpt/Langchain-Chatchat/webui_pages/knowledge_base/knowledge_base.py", line 331, in {"seq": i + 1, "id": x["id"], "page_content": x["page_content"], "source": x["metadata"].get("source"), ~^^^^^^ TypeError: string indices must be integers, not 'str'

liwei413519 commented 5 months ago

解决了，需要把知识阈值修改为1.0默认就是这个不能改！！！！！！！

liwei413519 commented 5 months ago

SCORE_THRESHOLD = 1.0 不能修改否则就会报错！！！！！

aben1900 commented 5 months ago

在哪个文件修改？

rookie0w0 commented 4 months ago

在哪个文件修改？

在configs下的kb_config.py

HappyJimmyBoy commented 3 months ago

2024-07-18 拉取main分支同样有这个问题,拉取dev分支一切正常了.

chatchat-space / Langchain-Chatchat

往向量库上传文件报错：TypeError: string indices must be integers, not 'str' #3952