[BUG] bge-m3不兼容 / 显存异常分配

liuchuan01 commented 3 months ago

问题描述 / Problem Description bge-m3不兼容 / 显存异常分配

和issue #4101 相似 复现问题的步骤 / Steps to Reproduce

配置model_config.py 制定模型为bge-m3
运行后，请求接口，根据content重建知识库：/knowledge_base/recreate_vector_store
完成文档切分后embedding到一半报错

预期的结果 / Expected Result 正常完成

实际结果 / Actual Result 2024-08-06 13:58:26,462 - embeddings_api.py[line:40] - ERROR: CUDA out of memory. Tried to allocate 18.88 GiB. GPU 0 has a total capacty of 23.48 GiB of which 1.53 GiB is free. Including non-PyTorch memory, this process has 21.92 GiB memory in use. Of the allocated memory 21.61 GiB is allocated by PyTorch, and 14.47 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/sse_starlette/sse.py", line 269, in call await wrap(partial(self.listen_for_disconnect, receive)) File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/sse_starlette/sse.py", line 258, in wrap await func() File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/sse_starlette/sse.py", line 215, in listen_for_disconnect message = await receive() ^^^^^^^^^^^^^^^ File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive await self.message_event.wait() File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/asyncio/locks.py", line 213, in wait await fut asyncio.exceptions.CancelledError: Cancelled by cancel scope 713285512450

During handling of the above exception, another exception occurred:

Exception Group Traceback (most recent call last): | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi | result = await app( # type: ignore[func-returns-value] | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call | return await self.app(scope, receive, send) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call | await super().call(scope, receive, send) | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/starlette/applications.py", line 116, in call | await self.middleware_stack(scope, receive, send) | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call | raise exc | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call | await self.app(scope, receive, _send) | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in call | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app | raise exc | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app | await app(scope, receive, sender) | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/starlette/routing.py", line 746, in call | await route.handle(scope, receive, send) | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle | await self.app(scope, receive, send) | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/starlette/routing.py", line 75, in app | await wrap_app_handling_exceptions(app, request)(scope, receive, send) | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app | raise exc | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app | await app(scope, receive, sender) | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/starlette/routing.py", line 73, in app | await response(scope, receive, send) | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/sse_starlette/sse.py", line 255, in call | async with anyio.create_task_group() as task_group: | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 680, in aexit | raise BaseExceptionGroup( | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception) +-+---------------- 1 ---------------- | Traceback (most recent call last): | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/sse_starlette/sse.py", line 258, in wrap | await func() | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/sse_starlette/sse.py", line 245, in stream_response | async for data in self.body_iterator: | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/starlette/concurrency.py", line 57, in iterate_in_threadpool | yield await anyio.to_thread.run_sync(_next, iterator) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync | return await get_async_backend().run_sync_in_worker_thread( | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread | return await future | ^^^^^^^^^^^^ | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run | result = context.run(func, *args) | ^^^^^^^^^^^^^^^^^^^^^^^^ | File "/home/star/miniconda3/envs/langchain-test/lib/python3.11/site-packages/starlette/concurrency.py", line 47, in _next | return next(iterator) | ^^^^^^^^^^^^^^ | File "/home/star/liuchuan/chatbot-langchain/server/knowledge_base/kb_doc_api.py", line 392, in output | kb.add_doc(kb_file, not_refresh_vs_cache=True) | File "/home/star/liuchuan/chatbot-langchain/server/knowledge_base/kb_service/base.py", line 130, in add_doc | doc_infos = self.do_add_doc(docs, **kwargs) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | File "/home/star/liuchuan/chatbot-langchain/server/knowledge_base/kb_service/faiss_kb_service.py", line 78, in do_add_doc | ids = vs.add_embeddings(text_embeddings=zip(data["texts"], data["embeddings"]), | ~~~~^^^^^^^^^ | TypeError: 'NoneType' object is not subscriptable +------------------------------------

环境信息 / Environment Information

Langchain-Chatchat 版本 / commit 号：0.2.10
部署方式（pypi 安装 / 源码部署 / docker 部署）：源码部署
使用的模型推理框架（Xinference / Ollama / OpenAI API 等）：huggingface加载
使用的 LLM 模型（GLM-4-9B / Qwen2-7B-Instruct 等）：不相关
使用的 Embedding 模型（bge-large-zh-v1.5 / m3e-base 等）：bge-m3
使用的向量库类型 (faiss / milvus / pg_vector 等)： faiss
操作系统及版本 / Operating system and version: Ubuntu22
Python 版本 / Python version: 3.11
推理使用的硬件（GPU / CPU / MPS / NPU 等） / Inference hardware (GPU / CPU / MPS / NPU, etc.): GPU 3090
其他相关环境信息 / Other relevant environment information: 注意是0.2.10版本的不是新的0.3.x 附加信息 / Additional Information 添加与问题相关的任何其他信息 / Add any other information related to the issue.

liuchuan01 commented 3 months ago

这个问题比较奇怪，脱离LangChain-chatchat的时候，我这样是没问题的，显存完全够用：

emb_model = HuggingFaceEmbeddings(model_name=hf_model_path, model_kwargs=model_kwargs)
vectorstore = FAISS.from_documents(documents, emb_model)

观察上面异常的点的时候，nvidia-smi显存也显示一直正常，在4G左右，会忽然飙升到10G又掉落回3G。然后就突然出现上面的错误了

liuchuan01 commented 3 months ago

确认过了不是框架的原因，BGE-m3存在输入字数越长，使用显存越大的情况。有一个异常分段13000多字所以....

chatchat-space / Langchain-Chatchat

[BUG] bge-m3不兼容 / 显存异常分配 #4685