Closed web3nomad closed 1 month ago
fixed in 2aa92a57b968c46a8b4ea6df92f4b0f82980875b
fixed in 2aa92a5
pid、http 端口、grpc 端口暂时记录在 settings.json 中
- 服务启动时,如果 settings.json 中包含这些记录,则向记录的进程发送 SIGTERM
- 因为这个进程有可能不是自己启动的qdrant,所以先用 http 端口尝试访问,通了再发送 SIGTERM
- 进程正常关闭的时候,从 settings.json 中去掉这些信息
- 支持动态端口:从6333开始寻找
实现得还有点问题,准备这么改一下:
我测试的时候遇到两个问题, 第一个问题是,切换 library 的时候,一直卡着
pub fn kill(pid: usize, addr: SocketAddr) -> anyhow::Result<()> {
let probe = format!("http://{}:{}", addr.ip(), addr.port());
let (tx, rx) = channel();
tokio::task::spawn(async move {
// <--- debug 下来是卡在这里,这一步一直不被执行,导致 set_current_library 接口始终无响应
let resp = reqwest::get(probe.clone()).await;
if let Ok(resp) = resp {
if resp.status() == reqwest::StatusCode::OK {
if let Err(e) = kill_by_sig_term(pid as u32) {
error!("failed to kill qdrant: {}", e);
}
}
}
// everything done
if tx.send(()).is_err() {
error!("failed to send result");
}
});
rx.recv()
.map_err(|e| anyhow::anyhow!("failed to receive result: {}", e))
}
kill 旧的 qdrant pid 的时候一直卡着
另一个问题是这样的,这是日志
2024-04-12T15:42:46.504257Z WARN qdrant::settings: Config file not found: config/config
2024-04-12T15:42:46.504282Z WARN qdrant::settings: Config file not found: config/development
2024-04-12T15:42:46.504308Z INFO storage::content_manager::consensus::persistent: Loading raft state from /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433/qdrant/storage/raft_state.json
2024-04-12T15:42:46.505201Z INFO storage::content_manager::toc: Loading collection: muse-v2-vision-512 2024-04-12T15:42:46.510496Z ERROR qdrant::startup: Panic backtrace:
0: std::backtrace::Backtrace::create
1: qdrant::startup::setup_panic_hook::{{closure}}
2: std::panicking::rust_panic_with_hook
3: std::panicking::begin_panic_handler::{{closure}}
4: std::sys_common::backtrace::__rust_end_short_backtrace
5: _rust_begin_unwind
6: core::panicking::panic_fmt
7: collection::shards::shard_holder::ShardHolder::load_shards::{{closure}}.89594
8: storage::content_manager::toc::TableOfContent::new
9: qdrant::main
10: std::sys_common::backtrace::__rust_begin_short_backtrace
11: _main
2024-04-12T15:42:46.510506Z ERROR qdrant::startup: Panic occurred in file /Users/runner/work/qdrant/qdrant/lib/collection/src/shards/replica_set/mod.rs at line 261: Failed to load local s
hard "/Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433/qdrant/storage/collections/muse-v2-vision-512/0": Service internal error:
Wal error: Can't init WAL: Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" }
2024-04-12T15:43:16.497349Z ERROR content_library::qdrant: failed to start qdrant server: qdrant start timeout
thread 'tokio-runtime-worker' panicked at /Users/xddotcom/workspace/muse/muse-v2-client/apps/api-server/src/ctx/default.rs:200:22:
called `Result::unwrap()` on an `Err` value: ()
2024-04-12T15:43:16.500035Z INFO api_server: Client requested operation '/libraries.set_current_library'
2024-04-12T15:43:16.500123Z INFO api_server::task_queue::pool: Task pool thread created: ThreadId(92)
2024-04-12T15:43:16.500175Z WARN api_server::ctx::default: invalid qdrant config, skipping killing qdrant server
2024-04-12T15:43:16.502147Z INFO quaint::pooled: Starting a sqlite pool with 1 connections.
2024-04-12T15:43:16.505849Z DEBUG vector_db::qdrant: qdrant params: QdrantParams { dir: "/Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0
734840433/qdrant", http_port: Some(6333), grpc_port: Some(6334) }
2024-04-12T15:43:16.505869Z DEBUG vector_db::qdrant: qdrant config: /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433/qdrant/conf
ig/config.yaml
2024-04-12T15:43:16.505876Z DEBUG vector_db::qdrant: qdrant reading config from /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433
/qdrant/config/config.yaml
过程大概是这样的:
不过,这个问题可能是可以忽略的,我遇到这个问题是因为我改了下 kill 方法,把 kill_by_sig_term(pid)
不放在 spawn 里面,而是直接在外面同步执行,这样才会遇到这个问题。如果前面的问题解决了不一定会有这个问题。
另一个问题是这样的,这是日志
2024-04-12T15:42:46.504257Z WARN qdrant::settings: Config file not found: config/config 2024-04-12T15:42:46.504282Z WARN qdrant::settings: Config file not found: config/development 2024-04-12T15:42:46.504308Z INFO storage::content_manager::consensus::persistent: Loading raft state from /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433/qdrant/storage/raft_state.json 2024-04-12T15:42:46.505201Z INFO storage::content_manager::toc: Loading collection: muse-v2-vision-512 2024-04-12T15:42:46.510496Z ERROR qdrant::startup: Panic backtrace: 0: std::backtrace::Backtrace::create 1: qdrant::startup::setup_panic_hook::{{closure}} 2: std::panicking::rust_panic_with_hook 3: std::panicking::begin_panic_handler::{{closure}} 4: std::sys_common::backtrace::__rust_end_short_backtrace 5: _rust_begin_unwind 6: core::panicking::panic_fmt 7: collection::shards::shard_holder::ShardHolder::load_shards::{{closure}}.89594 8: storage::content_manager::toc::TableOfContent::new 9: qdrant::main 10: std::sys_common::backtrace::__rust_begin_short_backtrace 11: _main 2024-04-12T15:42:46.510506Z ERROR qdrant::startup: Panic occurred in file /Users/runner/work/qdrant/qdrant/lib/collection/src/shards/replica_set/mod.rs at line 261: Failed to load local s hard "/Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433/qdrant/storage/collections/muse-v2-vision-512/0": Service internal error: Wal error: Can't init WAL: Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" } 2024-04-12T15:43:16.497349Z ERROR content_library::qdrant: failed to start qdrant server: qdrant start timeout thread 'tokio-runtime-worker' panicked at /Users/xddotcom/workspace/muse/muse-v2-client/apps/api-server/src/ctx/default.rs:200:22: called `Result::unwrap()` on an `Err` value: () 2024-04-12T15:43:16.500035Z INFO api_server: Client requested operation '/libraries.set_current_library' 2024-04-12T15:43:16.500123Z INFO api_server::task_queue::pool: Task pool thread created: ThreadId(92) 2024-04-12T15:43:16.500175Z WARN api_server::ctx::default: invalid qdrant config, skipping killing qdrant server 2024-04-12T15:43:16.502147Z INFO quaint::pooled: Starting a sqlite pool with 1 connections. 2024-04-12T15:43:16.505849Z DEBUG vector_db::qdrant: qdrant params: QdrantParams { dir: "/Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0 734840433/qdrant", http_port: Some(6333), grpc_port: Some(6334) } 2024-04-12T15:43:16.505869Z DEBUG vector_db::qdrant: qdrant config: /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433/qdrant/conf ig/config.yaml 2024-04-12T15:43:16.505876Z DEBUG vector_db::qdrant: qdrant reading config from /Users/xddotcom/Library/Application Support/cc.musedam.local/libraries/9d8f4468-c92b-486e-a393-bf0734840433 /qdrant/config/config.yaml
过程大概是这样的:
- kill 旧的 qdrant,启动新的 qdrant 时候,第一次会遇到报错,然后启动超时,set_current_library 请求失败
- 前端会重试 set_current_library,这时候已经没有旧的 qdrant 了,启动成功,恢复正常
不过,这个问题可能是可以忽略的,我遇到这个问题是因为我改了下 kill 方法,把
kill_by_sig_term(pid)
不放在 spawn 里面,而是直接在外面同步执行,这样才会遇到这个问题。如果前面的问题解决了不一定会有这个问题。
这个问题是因为两个qdrant进程同时读取了本地文件,还是 kill 那里没有做好
kill 的部分我改一改,用 pid 和类似于命令行 ps aux
的方式获取 pid 对应的进程信息,判断是否和项目有关,然后再确定是否 kill;现在这样通过端口进行 http 请求的方法不太好
程序主动记录下 qdrant 的 pid,并且支持动态端口
Originally posted by @web3nomad in https://github.com/bmrlab/tauri-dam-test-playground/issues/21#issuecomment-2049968744