dataelement / bisheng

BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI workflow, RAG, Agent, Unified model management, Evaluation, SFT, Dataset Management, Enterprise-level System Management, Observability and more.
https://bisheng.dataelem.com/
Apache License 2.0
8.92k stars 1.62k forks source link

v0.3.7 知识库,文档解析失败 #948

Open swing99527 opened 3 weeks ago

swing99527 commented 3 weeks ago

bisheng-backend | File "/usr/local/lib/python3.10/site-packages/bisheng_langchain/vectorstores/milvus.py", line 485, in add_texts
bisheng-backend | embeddings = self.embedding_func.embed_documents(texts)
bisheng-backend | | | | -> ['环氧胶黏剂 第2版_14214335_1-100.docx\n环氧胶黏剂:实用配方与制备实例(第二版)\n----------\nEPOXY ADHESIVE\n\n环氧胶黏剂\n\nEPOXY ADHESIVE\n\n环氧胶黏剂 \n\n...
bisheng-backend | | | -> <function BishengEmbedding.embed_documents at 0x7f2faca5cdc0>
bisheng-backend | | -> BishengEmbedding(model_id='8', model='text-embedding-v3', embedding_ctx_length=8192, max_retries=6, request_timeout=200, mode...
bisheng-backend | -> <bisheng_langchain.vectorstores.milvus.Milvus object at 0x7f2f65ec4b20>
bisheng-backend |
bisheng-backend | File "/app/bisheng/interface/utils.py", line 127, in wrapper
bisheng-backend | return func(*args, **kwargs)
bisheng-backend | | | -> {}
bisheng-backend | | -> (BishengEmbedding(model_id='8', model='text-embedding-v3', embedding_ctx_length=8192, max_retries=6, request_timeout=200, mod...
bisheng-backend | -> <function BishengEmbedding.embed_documents at 0x7f2faca5cd30>
bisheng-backend |
bisheng-backend | File "/app/bisheng/interface/embeddings/custom.py", line 163, in embed_documents
bisheng-backend | raise Exception(f'embedding error: {e}')
bisheng-backend |
bisheng-backend | Exception: embedding error: status_code: 400
bisheng-backend | code: InvalidParameter
bisheng-backend | message: batch size is invalid, it should not be larger than 6.: payload.input.contents

GangLiCN commented 1 week ago

换个别的类型的文件,例如txt, pdf试试。

看错误信息应该走的自定义embedding, 但是出错了。