bmrlab / gendam

A privacy-first generative DAM
6 stars 1 forks source link

qdrant 写入索引有时会遇到 Too many open files 导致写入失败 #11

Closed web3nomad closed 1 month ago

web3nomad commented 2 months ago
Task failed: FrameCaptionEmbedding, /, status: Internal,
 message: "Service internal error: RocksDB open error: IO error: While open a file for appending: /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/bb486d877bb41ed213d350bd49b58b6ce57ab56725717ccc1815c99b791ff7ee/qdrant/storage/collections/frame-caption-embedding/0/segments/ba7cbbdf-a5e3-4088-a572-6f27fb94a6fe/payload_index/MANIFEST-000005: Too many open files", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Thu, 14 Mar 2024 11:31:04 GMT", "content-length": "0"} }      

在打包的 app 执行的时候更容易出现,dev 环境下还没有遇到,可能和启动的配置有关,官方有个参考文档应该和这个错误相关 https://qdrant.tech/documentation/guides/common-errors/

web3nomad commented 2 months ago

事实上还没有到写入索引就报错了,报错的是 get_frame_caption_embedding 里面调用 make_sure_collection_created 的时候。

如果单独执行 get_frame_caption_embedding 不会报错,但是前面如果执行了一系列任务,包括 get_frame_caption 等等,这里就会报错。

2024-03-14T16:03:12.479418Z  INFO file_handler::video::utils::caption: @@@ get_frame_caption_embedding start: 80
2024-03-14T16:03:12.479478Z DEBUG tower::buffer::worker: service.ready=true processing request
2024-03-14T16:03:12.479598Z DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(5), flags: (0x4: END_HEADERS) }
2024-03-14T16:03:12.479650Z DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(5) }
2024-03-14T16:03:12.479661Z DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(5), flags: (0x1: END_STREAM) }
2024-03-14T16:03:12.480054Z DEBUG Connection{peer=Client}: h2::codec::framed_read: received frame=Headers { stream_id: StreamId(5), flags: (0x5: END_HEADERS | END_STREAM) }
2024-03-14T16:03:12.480129Z DEBUG tower::buffer::worker: service.ready=true processing request
2024-03-14T16:03:12.480162Z DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(7), flags: (0x4: END_HEADERS) }
2024-03-14T16:03:12.480172Z DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(7) }
2024-03-14T16:03:12.480179Z DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(7), flags: (0x1: END_STREAM) }
2024-03-14T16:03:12.571977Z DEBUG Connection{peer=Client}: h2::codec::framed_read: received frame=Headers { stream_id: StreamId(7), flags: (0x5: END_HEADERS | END_STREAM) }
2024-03-14T16:03:12.572096Z  WARN drop: ort::session: dropping SharedSessionInner
2024-03-14T16:03:12.572104Z  WARN drop: ort::session: dropping session ptr
2024-03-14T16:03:12.582431Z  WARN drop: ort::session: dropping SharedSessionInner
2024-03-14T16:03:12.582469Z  WARN drop: ort::session: dropping session ptr
2024-03-14T16:03:12.597810Z ERROR api_server::task_queue::pool: Task failed: FrameCaptionEmbedding, /, status: Internal, message: "Service internal error: RocksDB open error: IO error: While open a file for appending: /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/bb486d877bb41ed213d350bd49b58b6ce57ab56725717ccc1815c99b791ff7ee/qdrant/storage/collections/frame-caption-embedding/0/segments/8fc0d446-39ab-4662-bbb0-45cac79027c0/payload_index/MANIFEST-000005: Too many open files", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Thu, 14 Mar 2024 16:03:12 GMT", "content-length": "0"} }
web3nomad commented 2 months ago

launchctl limit maxfiles 输出 maxfiles 256 unlimited maxfiles 256 (Mac OS X soft limit currently) unlimited (Mac OS X hard limit currently set)

web3nomad commented 2 months ago

创建 collection 的时候就会出错,并不是写入索引的时候报错的 ...

2024-03-14T17:47:13.870799Z  INFO quaint::pooled: Starting a sqlite pool with 1 connections.
2024-03-14T17:47:13.877087Z  INFO file_downloader::download: check file path: "/Users/xddotcom/workspace/muse/muse-v2-client/target/release/bundle/macos/muse-v2-client.app/Contents/Resources/resources/qdrant"
2024-03-14T17:47:15.899254Z  INFO vector_db::qdrant: qdrant started
2024-03-14T17:47:15.904174Z  INFO content_library: collection info not found: frame-embedding, status: NotFound, message: "Not found: Collection `frame-embedding` doesn't exist!", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Thu, 14 Mar 2024 17:47:15 GMT", "content-length": "0"} }
2024-03-14T17:47:16.308452Z  INFO content_library: collection info not found: frame-caption-embedding, status: NotFound, message: "Not found: Collection `frame-caption-embedding` doesn't exist!", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Thu, 14 Mar 2024 17:47:15 GMT", "content-length": "0"} }
2024-03-14T17:47:16.370854Z ERROR content_library: failed to create collection: frame-caption-embedding, status: Internal, message: "Service internal error: RocksDB open error: IO error: While open directory: /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9fdc0479525e8f21dc9a5519a0786c1be67a580db2e726e6d1a44c59c55af1f9/qdrant/storage/collections/frame-caption-embedding/0/segments/f65e04fe-6903-4342-843d-634456ed21cd: Too many open files", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Thu, 14 Mar 2024 17:47:16 GMT", "content-length": "0"} }
2024-03-14T17:47:16.370892Z ERROR content_library: failed to make sure collection created: frame-caption-embedding, status: Internal, message: "Service internal error: RocksDB open error: IO error: While open directory: /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9fdc0479525e8f21dc9a5519a0786c1be67a580db2e726e6d1a44c59c55af1f9/qdrant/storage/collections/frame-caption-embedding/0/segments/f65e04fe-6903-4342-843d-634456ed21cd: Too many open files", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Thu, 14 Mar 2024 17:47:16 GMT", "content-length": "0"} }
2024-03-14T17:47:16.370902Z  WARN vector_db::qdrant: qdrant server dropped
2024-03-14T17:47:16.370913Z  INFO vector_db::qdrant: qdrant successfully killed
2024-03-14T17:47:16.371411Z ERROR muse_desktop: Failed to load library: ()
web3nomad commented 2 months ago

default_segment_number 设置成 1,暂时搞定了 https://github.com/bmrlab/tauri-dam-test-playground/commit/41e8c8cf5709447201e3dbf56056c413840449a8

先不关闭,需要优化。

web3nomad commented 2 months ago

另外:

qdrant 还有个问题,dev 环境和 prod 环境可能是因为默认配置不一样,storage 是无法共用的,我们应该需要准备一份完整的给我们自己项目用的默认值,方便调试和移植。

原因是不是因为默认配置,这个没经过验证,只是个猜测,但 storage 的确无法共用。

web3nomad commented 2 months ago

kino 的配置, 在 /Users/xddotcom/Library/Application Support/com.kino.Desktop/Kino AI Cache/vector_db/qdrant_config.yaml

log_level: INFO
storage:
  storage_path: /Users/xddotcom/Library/Application Support/com.kino.Desktop/Kino AI Cache/vector_db/storage
  snapshots_path: /Users/xddotcom/Library/Application Support/com.kino.Desktop/Kino AI Cache/vector_db/snapshots
  temp_path: null
  on_disk_payload: false
  update_concurrency: null
  wal:
    wal_capacity_mb: 32
    wal_segments_ahead: 0
  node_type: Normal
  performance:
    max_search_threads: 0
    max_optimization_threads: 1
  optimizers:
    deleted_threshold: 0.2
    vacuum_min_vector_number: 1000
    default_segment_number: 0
    max_segment_size_kb: null
    memmap_threshold_kb: null
    indexing_threshold_kb: 20000
    flush_interval_sec: 5
    max_optimization_threads: 1
  hnsw_index:
    m: 16
    ef_construct: 100
    full_scan_threshold_kb: 10000
    max_indexing_threads: 0
    on_disk: false
    payload_m: null
service:
  max_request_size_mb: 32
  max_workers: 0
  host: 0.0.0.0
  grpc_port: 6334
  enable_cors: true
  enable_tls: false
  verify_https_client_certificate: false
cluster:
  enabled: false
  p2p:
    port: 6335
    enable_tls: false
  consensus:
    tick_period_ms: 100
telemetry_disabled: true
tls:
  cert: /Users/xddotcom/Library/Application Support/com.kino.Desktop/Kino AI Cache/vector_db/tls/cert.pem
  key: /Users/xddotcom/Library/Application Support/com.kino.Desktop/Kino AI Cache/vector_db/tls/key.pem
  ca_cert: /Users/xddotcom/Library/Application Support/com.kino.Desktop/Kino AI Cache/vector_db/tls/cacert.pem
  cert_ttl: 3600
zhuojg commented 2 months ago

另外:

qdrant 还有个问题,dev 环境和 prod 环境可能是因为默认配置不一样,storage 是无法共用的,我们应该需要准备一份完整的给我们自己项目用的默认值,方便调试和移植。

原因是不是因为默认配置,这个没经过验证,只是个猜测,但 storage 的确无法共用。

storage 无法共用的表现是向量数据库没有数据,还是启动的时候会报错👀 现在每次启动qdrant的时候都会重写一次配置,所以如果没有复制storage和snapshots,启动之后就会读不到数据

zhuojg commented 2 months ago

后续还是改成只用一个 collection,用 payload 来区分不同类型的数据👇 减少 collection 应该可以减少 open files 的数量。

参考 qdrant 文档: https://qdrant.tech/documentation/guides/multiple-partitions/

image
zhuojg commented 2 months ago

后续还是改成只用一个 collection,用 payload 来区分不同类型的数据👇 减少 collection 应该可以减少 open files 的数量。

参考 qdrant 文档: https://qdrant.tech/documentation/guides/multiple-partitions/ image

有一个可以讨论的地方:是给每个 library 建立一个 collection,还是每个 library 有一个独立的qdrant的路径

看起来还是现在的方案2更好

web3nomad commented 1 month ago

已经减少了 collection,暂时没这个问题了,先 close

web3nomad commented 1 month ago

看 kino,解法是通过 ulimit -n 设置最大打开的文件数,ulimit 是可以在 shell 启动时进行,对该 shell 及其所有子进程有效,不会系统全局生效,适合给进程单独设置。

#!/bin/bash

# Check if a binary path is provided
if [ "$#" -lt 1 ]; then
    echo "Usage: $0 /path/to/binary [additional arguments]"
    exit 1
fi

# The first argument is the path to the binary
BINARY_PATH="$1"

# Set the ulimit
echo "Setting ulimit to 10240"
ulimit -n 10240

# Execute the binary, passing any additional arguments
echo "Executing $BINARY_PATH with arguments: ${@:2}"
"$BINARY_PATH" --config-path "${@:2}"

记录下

zhuojg commented 1 month ago

现在直接用 /bin/bash -c "ulimit -n 10240; %QDRANT_PATH --config-path %CONFIG_PATH% & PID=$!; setpgid $PID $$; wait" 来启动进程。

PID=$!; setpgid $PID $$ 这里是用来将子进程 process group id 设置为父进程的 pid,方便通过操作系统接口统一管理,但实际上还没有这样实现

关闭的时候不能直接用获取到的 pid,需要检查所有进程,找到 parent pid 为 pid 的进程,然后杀掉,最后等待 pid 对应的进程结束。


可优化的点:这里其实已经设置了 process group id(所有父子进程都是同样的 process group id),通过操作系统提供的 API 可以直接杀掉整个group,但是这样就不太方便检测进程是否关闭的状态了,所以先不这样改。