1Panel-dev / MaxKB

🚀 MaxKB 是一款基于大语言模型和 RAG 的开源知识库问答系统,广泛应用于智能客服、企业内部知识库、学术研究与教育等场景。
https://maxkb.cn/
GNU General Public License v3.0
11.6k stars 1.52k forks source link

[BUG]向量化大Excel文件时,会与数据库断开连接,但不会自动重连 #1368

Open zwjzxh520 opened 1 month ago

zwjzxh520 commented 1 month ago

联系方式

No response

MaxKB 版本

v1.6.1 (build at 2024-09-29T19:14, commit: 81ffe59c)

问题描述

向量化大的 Excel (20W+行)时,从来没有成功过,查了一下日志文件 8/0/80a4d237-f66c-482b-915b-8ebef8bcb28c.log,发现有报错,错误内容已放在下面。

同时发现文档的向量化数据与实际文档内容数量差距很大,因此任务一直都没有成功完成过。

重现步骤

向量化小型的文件很正常,几十上百行都不在话下,但是一旦使用了20W+行的Excel,就没有成功过的,试过了2台应用服务器,都是一样,PostgreSQL 服务器都是用的同一台。

期待的正确结果

正常完成向量化任务

相关日志输出

2024-10-12 15:34:56 向量化文档:12021462-8865-11ef-8714-0242ac130002出现错误server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
Traceback (most recent call last):
  File "/opt/py3/lib/python3.11/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.OperationalError: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/maxkb/app/apps/common/event/listener_manage.py", line 174, in embedding_by_document
    VectorStore.get_embedding_vector().batch_save(data_list, embedding_model, is_save_function)
  File "/opt/maxkb/app/apps/embedding/vector/base_vector.py", line 102, in batch_save
    self._batch_save(child_array, embedding, is_save_function)
  File "/opt/maxkb/app/apps/embedding/vector/pg_vector.py", line 73, in _batch_save
    if is_save_function():
       ^^^^^^^^^^^^^^^^^^
  File "/opt/maxkb/app/apps/common/event/listener_manage.py", line 171, in is_save_function
    return QuerySet(Document).filter(id=document_id).exists()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/py3/lib/python3.11/site-packages/django/db/models/query.py", line 1241, in exists
    return self.query.has_results(using=self.db)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/py3/lib/python3.11/site-packages/django/db/models/sql/query.py", line 598, in has_results
    return compiler.has_results()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/py3/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1530, in has_results
    return bool(self.execute_sql(SINGLE))
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/py3/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1562, in execute_sql
    cursor.execute(sql, params)
  File "/opt/py3/lib/python3.11/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/py3/lib/python3.11/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/py3/lib/python3.11/site-packages/django/db/backends/utils.py", line 84, in _execute
    with self.db.wrap_database_errors:
  File "/opt/py3/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/opt/py3/lib/python3.11/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django.db.utils.OperationalError: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

附加信息

Clip_2024-10-14_15-20-41

zyyfit commented 1 month ago

感谢反馈,我们排查一下问题