cnosdb / cnosdb

A cloud-native open source distributed time series database with high performance, high compression ratio and high availability. http://www.cnosdb.cloud
https://www.cnosdb.com
GNU Affero General Public License v3.0
1.66k stars 322 forks source link

[BUG]Returns 502 when querying #2213

Open Benxiaohai001 opened 4 months ago

Benxiaohai001 commented 4 months ago

Describe the bug

version:cnosdb 2.4.1, revision 9b25565a6c8ed5a12726475c0be6cb099ef980b2 deploy:3meta+3querytskv Returns 502 when querying The current situation is: 3meta+3querytskv deployed cluster. One of the querytskv nodes panics. This is displayed when the client connects to the service. The client return is similar to #1671, you can refer to it.

To Reproduce

public ❯ \d
502 Bad Gateway, details: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.18.0</center>
</body>
</html>

public ❯ show databases;
502 Bad Gateway, details: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.18.0</center>
</body>
</html>
Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
   1: anyhow::__private::format_err
   2: client::ctx::SessionContext::sql::{{closure}}
   3: client::exec::exec_and_print::{{closure}}
   4: client::exec::exec_from_repl::{{closure}}
   5: cnosdb_cli::main::{{closure}}
   6: cnosdb_cli::main
   7: std::sys_common::backtrace::__rust_begin_short_backtrace
   8: std::rt::lang_start::{{closure}}
   9: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/39f2657d1101b50f9b71ae460b762d330cc8426b/library/core/src/ops/function.rs:287:13
  10: std::panicking::try::do_call
             at /rustc/39f2657d1101b50f9b71ae460b762d330cc8426b/library/std/src/panicking.rs:487:40
  11: std::panicking::try
             at /rustc/39f2657d1101b50f9b71ae460b762d330cc8426b/library/std/src/panicking.rs:451:19
  12: std::panic::catch_unwind
             at /rustc/39f2657d1101b50f9b71ae460b762d330cc8426b/library/std/src/panic.rs:140:14
  13: std::rt::lang_start_internal::{{closure}}
             at /rustc/39f2657d1101b50f9b71ae460b762d330cc8426b/library/std/src/rt.rs:148:48
  14: std::panicking::try::do_call
             at /rustc/39f2657d1101b50f9b71ae460b762d330cc8426b/library/std/src/panicking.rs:487:40
  15: std::panicking::try
             at /rustc/39f2657d1101b50f9b71ae460b762d330cc8426b/library/std/src/panicking.rs:451:19
  16: std::panic::catch_unwind
             at /rustc/39f2657d1101b50f9b71ae460b762d330cc8426b/library/std/src/panic.rs:140:14
  17: std::rt::lang_start_internal
             at /rustc/39f2657d1101b50f9b71ae460b762d330cc8426b/library/std/src/rt.rs:148:20
  18: main
  19: __libc_start_main
             at /usr/src/debug/glibc-2.17-c758a686/csu/../csu/libc-start.c:266
  20: <unknown>

server

2024-07-01T03:06:35.369140465Z ERROR cnosdb::http::http_service: Failed to handle http sql request, err: Query { source: Datafusion { source: External(External(PreExecution { error: "Connect to Server(dfossds-main.query_tskv3.cnosdb.com:8903) error reason: transport error" })), location: Location { file: "query_server/spi/src/lib.rs", line: 629, column: 34 }, backtrace: Backtrace(   0: <spi::QueryError as core::convert::From<datafusion_common::error::DataFusionError>>::from
   1: spi::query::execution::Output::chunk_result::{{closure}}
   2: cnosdb::http::response::HttpResponse::wrap_batches_to_response::{{closure}}
   3: cnosdb::http::http_service::HttpService::query::{{closure}}::{{closure}}
   4: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
   5: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
   6: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
   7: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
   8: <warp::filter::service::FilteredFuture<F> as core::future::future::Future>::poll
   9: hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch
  10: <hyper::server::server::new_svc::NewSvcTask<I,N,S,E,W> as core::future::future::Future>::poll
  11: tokio::runtime::task::raw::poll
  12: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
  13: tokio::runtime::scheduler::multi_thread::worker::run
  14: tokio::runtime::task::raw::poll
  15: std::sys_common::backtrace::__rust_begin_short_backtrace
  16: core::ops::function::FnOnce::call_once{{vtable.shim}}
  17: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
      std::sys::unix::thread::Thread::new::thread_start
             at rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys/unix/thread.rs:108:17
  18: <unknown>
  19: __clone
) } }

Expected behavior

No response

Additional context

No response

bartliu827 commented 2 months ago

Enable proxy and access through proxy? Is CLI connected to the failed node?