bmrlab / gendam

A privacy-first generative DAM
6 stars 1 forks source link

程序主动记录下 qdrant 的 pid,并且支持动态端口 #48

Closed web3nomad closed 1 month ago

web3nomad commented 1 month ago
          不处理这个问题了,其实 app 退出没有停掉 qdrant 没关系,只要重新启动可以继续正常工作就行。

程序主动记录下 qdrant 的 pid,并且支持动态端口

Originally posted by @web3nomad in https://github.com/bmrlab/tauri-dam-test-playground/issues/21#issuecomment-2049968744

zhuojg commented 1 month ago

fixed in 2aa92a57b968c46a8b4ea6df92f4b0f82980875b

zhuojg commented 1 month ago

fixed in 2aa92a5

  • pid、http 端口、grpc 端口暂时记录在 settings.json 中

    • 服务启动时,如果 settings.json 中包含这些记录,则向记录的进程发送 SIGTERM
    • 因为这个进程有可能不是自己启动的qdrant,所以先用 http 端口尝试访问,通了再发送 SIGTERM
    • 进程正常关闭的时候,从 settings.json 中去掉这些信息
  • 支持动态端口:从6333开始寻找

实现得还有点问题,准备这么改一下:

web3nomad commented 1 month ago

我测试的时候遇到两个问题, 第一个问题是,切换 library 的时候,一直卡着

pub fn kill(pid: usize, addr: SocketAddr) -> anyhow::Result<()> {
    let probe = format!("http://{}:{}", addr.ip(), addr.port());

    let (tx, rx) = channel();

    tokio::task::spawn(async move {

        // <--- debug 下来是卡在这里,这一步一直不被执行,导致 set_current_library 接口始终无响应

        let resp = reqwest::get(probe.clone()).await;
        if let Ok(resp) = resp {
            if resp.status() == reqwest::StatusCode::OK {
                if let Err(e) = kill_by_sig_term(pid as u32) {
                    error!("failed to kill qdrant: {}", e);
                }
            }
        }

        // everything done
        if tx.send(()).is_err() {
            error!("failed to send result");
        }
    });

    rx.recv()
        .map_err(|e| anyhow::anyhow!("failed to receive result: {}", e))
}

kill 旧的 qdrant pid 的时候一直卡着

web3nomad commented 1 month ago

另一个问题是这样的,这是日志

2024-04-12T15:42:46.504257Z  WARN qdrant::settings: Config file not found: config/config    
2024-04-12T15:42:46.504282Z  WARN qdrant::settings: Config file not found: config/development              
2024-04-12T15:42:46.504308Z  INFO storage::content_manager::consensus::persistent: Loading raft state from /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433/qdrant/storage/raft_state.json                                                                                                                                 
2024-04-12T15:42:46.505201Z  INFO storage::content_manager::toc: Loading collection: muse-v2-vision-512                                                                                    2024-04-12T15:42:46.510496Z ERROR qdrant::startup: Panic backtrace:    
   0: std::backtrace::Backtrace::create                                                                                                                                                    
   1: qdrant::startup::setup_panic_hook::{{closure}}                                                                                                                                       
   2: std::panicking::rust_panic_with_hook                                                   
   3: std::panicking::begin_panic_handler::{{closure}}                            
   4: std::sys_common::backtrace::__rust_end_short_backtrace               
   5: _rust_begin_unwind                                                                                                                                                                   
   6: core::panicking::panic_fmt                                                             
   7: collection::shards::shard_holder::ShardHolder::load_shards::{{closure}}.89594
   8: storage::content_manager::toc::TableOfContent::new           
   9: qdrant::main                                                                           
  10: std::sys_common::backtrace::__rust_begin_short_backtrace                 
  11: _main                                                                                                                                                                                

2024-04-12T15:42:46.510506Z ERROR qdrant::startup: Panic occurred in file /Users/runner/work/qdrant/qdrant/lib/collection/src/shards/replica_set/mod.rs at line 261: Failed to load local s
hard "/Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433/qdrant/storage/collections/muse-v2-vision-512/0": Service internal error:
 Wal error: Can't init WAL: Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" }    
2024-04-12T15:43:16.497349Z ERROR content_library::qdrant: failed to start qdrant server: qdrant start timeout
thread 'tokio-runtime-worker' panicked at /Users/xddotcom/workspace/muse/muse-v2-client/apps/api-server/src/ctx/default.rs:200:22:
called `Result::unwrap()` on an `Err` value: ()                                        
2024-04-12T15:43:16.500035Z  INFO api_server: Client requested operation '/libraries.set_current_library'
2024-04-12T15:43:16.500123Z  INFO api_server::task_queue::pool: Task pool thread created: ThreadId(92)
2024-04-12T15:43:16.500175Z  WARN api_server::ctx::default: invalid qdrant config, skipping killing qdrant server
2024-04-12T15:43:16.502147Z  INFO quaint::pooled: Starting a sqlite pool with 1 connections.                                                                                               
2024-04-12T15:43:16.505849Z DEBUG vector_db::qdrant: qdrant params: QdrantParams { dir: "/Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0
734840433/qdrant", http_port: Some(6333), grpc_port: Some(6334) }                         
2024-04-12T15:43:16.505869Z DEBUG vector_db::qdrant: qdrant config: /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433/qdrant/conf
ig/config.yaml                                                                               
2024-04-12T15:43:16.505876Z DEBUG vector_db::qdrant: qdrant reading config from /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433
/qdrant/config/config.yaml                                                                   

过程大概是这样的:

  1. kill 旧的 qdrant,启动新的 qdrant 时候,第一次会遇到报错,然后启动超时,set_current_library 请求失败
  2. 前端会重试 set_current_library,这时候已经没有旧的 qdrant 了,启动成功,恢复正常

不过,这个问题可能是可以忽略的,我遇到这个问题是因为我改了下 kill 方法,把 kill_by_sig_term(pid) 不放在 spawn 里面,而是直接在外面同步执行,这样才会遇到这个问题。如果前面的问题解决了不一定会有这个问题。

zhuojg commented 1 month ago

另一个问题是这样的,这是日志

2024-04-12T15:42:46.504257Z  WARN qdrant::settings: Config file not found: config/config    
2024-04-12T15:42:46.504282Z  WARN qdrant::settings: Config file not found: config/development              
2024-04-12T15:42:46.504308Z  INFO storage::content_manager::consensus::persistent: Loading raft state from /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433/qdrant/storage/raft_state.json                                                                                                                                 
2024-04-12T15:42:46.505201Z  INFO storage::content_manager::toc: Loading collection: muse-v2-vision-512                                                                                    2024-04-12T15:42:46.510496Z ERROR qdrant::startup: Panic backtrace:    
   0: std::backtrace::Backtrace::create                                                                                                                                                    
   1: qdrant::startup::setup_panic_hook::{{closure}}                                                                                                                                       
   2: std::panicking::rust_panic_with_hook                                                   
   3: std::panicking::begin_panic_handler::{{closure}}                            
   4: std::sys_common::backtrace::__rust_end_short_backtrace               
   5: _rust_begin_unwind                                                                                                                                                                   
   6: core::panicking::panic_fmt                                                             
   7: collection::shards::shard_holder::ShardHolder::load_shards::{{closure}}.89594
   8: storage::content_manager::toc::TableOfContent::new           
   9: qdrant::main                                                                           
  10: std::sys_common::backtrace::__rust_begin_short_backtrace                 
  11: _main                                                                                                                                                                                

2024-04-12T15:42:46.510506Z ERROR qdrant::startup: Panic occurred in file /Users/runner/work/qdrant/qdrant/lib/collection/src/shards/replica_set/mod.rs at line 261: Failed to load local s
hard "/Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433/qdrant/storage/collections/muse-v2-vision-512/0": Service internal error:
 Wal error: Can't init WAL: Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" }    
2024-04-12T15:43:16.497349Z ERROR content_library::qdrant: failed to start qdrant server: qdrant start timeout
thread 'tokio-runtime-worker' panicked at /Users/xddotcom/workspace/muse/muse-v2-client/apps/api-server/src/ctx/default.rs:200:22:
called `Result::unwrap()` on an `Err` value: ()                                        
2024-04-12T15:43:16.500035Z  INFO api_server: Client requested operation '/libraries.set_current_library'
2024-04-12T15:43:16.500123Z  INFO api_server::task_queue::pool: Task pool thread created: ThreadId(92)
2024-04-12T15:43:16.500175Z  WARN api_server::ctx::default: invalid qdrant config, skipping killing qdrant server
2024-04-12T15:43:16.502147Z  INFO quaint::pooled: Starting a sqlite pool with 1 connections.                                                                                               
2024-04-12T15:43:16.505849Z DEBUG vector_db::qdrant: qdrant params: QdrantParams { dir: "/Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0
734840433/qdrant", http_port: Some(6333), grpc_port: Some(6334) }                         
2024-04-12T15:43:16.505869Z DEBUG vector_db::qdrant: qdrant config: /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433/qdrant/conf
ig/config.yaml                                                                               
2024-04-12T15:43:16.505876Z DEBUG vector_db::qdrant: qdrant reading config from /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433
/qdrant/config/config.yaml                                                                   

过程大概是这样的:

  1. kill 旧的 qdrant,启动新的 qdrant 时候,第一次会遇到报错,然后启动超时,set_current_library 请求失败
  2. 前端会重试 set_current_library,这时候已经没有旧的 qdrant 了,启动成功,恢复正常

不过,这个问题可能是可以忽略的,我遇到这个问题是因为我改了下 kill 方法,把 kill_by_sig_term(pid) 不放在 spawn 里面,而是直接在外面同步执行,这样才会遇到这个问题。如果前面的问题解决了不一定会有这个问题。

这个问题是因为两个qdrant进程同时读取了本地文件,还是 kill 那里没有做好

kill 的部分我改一改,用 pid 和类似于命令行 ps aux 的方式获取 pid 对应的进程信息,判断是否和项目有关,然后再确定是否 kill;现在这样通过端口进行 http 请求的方法不太好

zhuojg commented 1 month ago

kill 部分改了一下,仅通过 pid 来判断

以上逻辑仅对 mac 和 linux 有效,windows 需要再看看别的方法