It4innovations / hyperqueue

Scheduler for sub-node tasks for HPC systems with batch scheduling
https://it4innovations.github.io/hyperqueue
MIT License
272 stars 21 forks source link

HQ crashes #743

Open chkabir opened 3 weeks ago

chkabir commented 3 weeks ago

Hi,

I was running an hq server at the Oven node at metacentrum.cz. The oven node is supposed to be explicitly designed to let processes run for long times, and even after their walltime. However, for the last instances the Hq server keeps crashing. Below I attach the relevant statements from the log file:

97: 0x557345bba500 - main 98: 0x154fcfff624a - libc_start_call_main at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16 99: 0x154fcfff6305 - libc_start_main_impl at ./csu/../csu/libc-start.c:360:3 100: 0x557345ad4049 - 101: 0x0 - Oops, HyperQueue has crashed. This is a bug, sorry for that. If you would be so kind, please report this issue at the HQ issue tracker: https://github.com/It4innovations/hyperqueue/issues/new?title=HQ%20crashes Please include the above error (starting from "thread ... panicked ...") and the stack backtrace in the issue contents, along with the following information:

HyperQueue version: v0.19.0

You can also re-run HyperQueue server (and its workers) with the RUST_LOG=hq=debug,tako=debug environment variable, and attach the logs to the issue, to provide us more information.

Can you kindly look into this error ?

Kobzol commented 3 weeks ago

Hi, thanks for the report. It looks like you have cut out the most important part of the stack trace though (you only sent the lines starting at stack frame #97) :) Could you please include the whole stack trace? Thanks!

chkabir commented 3 weeks ago

Sorry about that: this is the whole stack thread

thread 'main' panicked at crates/tako/src/internal/server/worker.rs:126:9: assertion failed: self.sn_tasks.remove(&task.id) stack backtrace: 0: 0x557345f0abf9 - std::backtrace_rs::backtrace::libunwind::trace::hbee8a7973eeb6c93 at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5 1: 0x557345f0abf9 - std::backtrace_rs::backtrace::trace_unsynchronized::hc8ac75eea3aa6899 at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5 2: 0x557345f0abf9 - std::sys_common::backtrace::_print_fmt::hc7f3e3b5298b1083 at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:68:5 3: 0x557345f0abf9 - ::fmt::hbb235daedd7c6190 at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:44:22 4: 0x557345c55b60 - core::fmt::rt::Argument::fmt::h76c38a80d925a410 at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/fmt/rt.rs:142:9 5: 0x557345c55b60 - core::fmt::write::h3ed6aeaa977c8e45 at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/fmt/mod.rs:1120:17 6: 0x557345ed387e - std::io::Write::write_fmt::h78b18af5775fedb5 at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/io/mod.rs:1810:15 7: 0x557345f0cc2e - std::sys_common::backtrace::_print::h5d645a07e0fcfdbb at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:47:5 8: 0x557345f0cc2e - std::sys_common::backtrace::print::h85035a511aafe7a8 at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:34:9 9: 0x557345f0c4d7 - std::panicking::default_hook::{{closure}}::hcce8cea212785a25 10: 0x557345f0c0bf - std::panicking::default_hook::hf5fcb0f213fe709a at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:292:9 11: 0x557345bb9eeb - call<(&core::panic::panic_info::PanicInfo), (dyn core::ops::function::Fn<(&core::panic::panic_info::PanicInfo), Output=()> + core::marker::Send + core::marker::Sync), alloc::alloc::Global> at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2029:9 12: 0x557345bb9eeb - {closure#0} at /w/hyperqueue/hyperqueue/crates/hyperqueue/src/bin/hq.rs:360:9 13: 0x557345f0d21a - <alloc::boxed::Box<F,A> as core::ops::function::Fn>::call::hbc5ccf4eb663e1e5 at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2029:9 14: 0x557345f0d21a - std::panicking::rust_panic_with_hook::h095fccf1dc9379ee at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:783:13 15: 0x557345f0cf68 - std::panicking::begin_panic_handler::{{closure}}::h032ba12139b353db at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:649:13 16: 0x557345f0cef6 - std::sys_common::backtrace::rust_end_short_backtrace::h9259bc2ff8fd0f76 at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:171:18 17: 0x557345f0ceef - rust_begin_unwind at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5 18: 0x557345a94074 - core::panicking::panic_fmt::h784f20a50eaab275 at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14 19: 0x557345a94242 - core::panicking::panic::hb837a5ebbbe5b188 at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:144:5 20: 0x557345f1e426 - remove_sn_task at /w/hyperqueue/hyperqueue/crates/tako/src/internal/server/worker.rs:126:9 21: 0x557345b60477 - on_task_finished 22: 0x557345b60477 - {async_fn#0}<futures_util::stream::stream::split::SplitStream<tokio_util::codec::framed::Framed<tokio::net::tcp::stream::TcpStream, tokio_util::codec::length_delimited::LengthDelimitedCodec>>> at /__w/hyperqueue/hyperqueue/crates/tako/src/internal/server/rpc.rs:270:17 23: 0x557345b60477 - {closure#2} at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/macros/select.rs:524:49 24: 0x557345b60477 - poll<tako::internal::server::rpc::worker_rpc_loop::{async_fn#0}::tokio_select_util::Out<core::result::Result<core::option::Option, tako::internal::common::error::DsError>, core::result::Result<(), std::io::error::Error>, tako::gateway::LostWorkerReason>, tako::internal::server::rpc::worker_rpc_loop::{async_fn#0}::{closure_env#2}> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/future/poll_fn.rs:58:9 25: 0x557345b60477 - {async_fn#0} at /w/hyperqueue/hyperqueue/crates/tako/src/internal/server/rpc.rs:212:18 26: 0x557345b87b4b - {async_block#0} at /__w/hyperqueue/hyperqueue/crates/tako/src/internal/server/rpc.rs:64:83 27: 0x557345b87b4b - {closure#0}<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared, alloc::alloc::Global>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/core.rs:328:17 28: 0x557345b87b4b - with_mut<tokio::runtime::task::core::Stage<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}>, core::task::poll::Poll<()>, tokio::runtime::task::core::{impl#6}::poll::{closure_env#0}<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared, alloc::alloc::Global>>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/loom/std/unsafe_cell.rs:16:9 29: 0x557345b87b4b - poll<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared, alloc::alloc::Global>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/core.rs:317:30 30: 0x557345b87b4b - {closure#0}<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared, alloc::alloc::Global>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/harness.rs:485:19 31: 0x557345b87b4b - call_once<core::task::poll::Poll<()>, tokio::runtime::task::harness::poll_future::{closure_env#0}<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared, alloc::alloc::Global>>> at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panic/unwind_safe.rs:272:9 32: 0x557345b87b4b - do_call<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared, alloc::alloc::Global>>>, core::task::poll::Poll<()>> at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:552:40 33: 0x557345b87b4b - try<core::task::poll::Poll<()>, core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared, alloc::alloc::Global>>>> at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:516:19 34: 0x557345b87b4b - catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared, alloc::alloc::Global>>>, core::task::poll::Poll<()>> at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panic.rs:142:14 35: 0x557345b87b4b - poll_future<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared, alloc::alloc::Global>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/harness.rs:473:18 36: 0x557345b87b4b - poll_inner<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared, alloc::alloc::Global>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/harness.rs:208:27 37: 0x557345b87b4b - poll<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared, alloc::alloc::Global>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/harness.rs:153:15 38: 0x557345b87b4b - poll<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared, alloc::alloc::Global>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/raw.rs:271:5 39: 0x557345f6f399 - poll at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/raw.rs:201:18 40: 0x557345f6f399 - run<alloc::sync::Arc<tokio::task::local::Shared, alloc::alloc::Global>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/task/mod.rs:416:9 41: 0x557345f6f399 - {closure#0} at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/task/local.rs:676:68 42: 0x557345f6f399 - with_budget<(), tokio::task::local::{impl#4}::tick::{closure_env#0}> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/coop.rs:107:5 43: 0x557345f6f399 - budget<(), tokio::task::local::{impl#4}::tick::{closure_env#0}> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/coop.rs:73:5 44: 0x557345f6f399 - tick at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/task/local.rs:676:31 45: 0x557345b59b7a - {closure#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/task/local.rs:982:16 46: 0x557345b59b7a - {closure#0}<core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>, tokio::task::local::{impl#10}::poll::{closure_env#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/task/local.rs:730:13 47: 0x557345b59b7a - try_with<tokio::task::local::LocalData, tokio::task::local::{impl#4}::with::{closure_env#0}<core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>, tokio::task::local::{impl#10}::poll::{closure_env#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>>, core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>> at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/local.rs:270:16 48: 0x557345b59b7a - with<tokio::task::local::LocalData, tokio::task::local::{impl#4}::with::{closure_env#0}<core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>, tokio::task::local::{impl#10}::poll::{closure_env#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>>, core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>> at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/local.rs:246:9 49: 0x557345b59b7a - with<core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>, tokio::task::local::{impl#10}::poll::{closure_env#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/task/local.rs:728:17 50: 0x557345b59b7a - poll<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/task/local.rs:968:9 51: 0x557345b59b7a - {async_fn#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/task/local.rs:635:19 52: 0x557345b59b7a - {async_fn#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}, core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>> at /w/hyperqueue/hyperqueue/crates/tako/src/internal/common/taskgroup.rs:15:36 53: 0x557345af3388 - {async_fn#0} at /w/hyperqueue/hyperqueue/crates/tako/src/internal/server/rpc.rs:48:68 54: 0x557345af3388 - {closure#0} at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/macros/select.rs:524:49 55: 0x557345af3388 - poll<tako::internal::server::start::server_start::{async_fn#0}::{async_block#0}::tokio_select_util::Out<(), core::result::Result<(), tako::internal::common::error::DsError>>, tako::internal::server::start::server_start::{async_fn#0}::{async_block#0}::{closure_env#0}> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/future/poll_fn.rs:58:9 56: 0x557345af3388 - {async_block#0} at /w/hyperqueue/hyperqueue/crates/tako/src/internal/server/start.rs:92:9 57: 0x557345af3388 - {closure#2} at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/macros/select.rs:524:49 58: 0x557345b0fd83 - poll<hyperqueue::server::backend::{impl#0}::start::{async_fn#0}::{async_block#1}::__tokio_select_util::Out<core::result::Result<(), hyperqueue::common::error::HqError>, core::result::Result<(), hyperqueue::common::error::HqError>, core::result::Result<(), tako::internal::common::error::DsError>>, hyperqueue::server::backend::{impl#0}::start::{async_fn#0}::{async_block#1}::{closure_env#2}> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/future/poll_fn.rs:58:9 59: 0x557345b0fd83 - {async_block#1} at /w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/backend.rs:165:13 60: 0x557345b0fd83 - {closure#0} at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/macros/select.rs:524:49 61: 0x557345b0fd83 - poll<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block#4}::tokio_select_util::Out<(), (), (), core::result::Result<(), hyperqueue::common::error::HqError>>, hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block#4}::{closure_env#0}> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/future/poll_fn.rs:58:9 62: 0x557345b0fd83 - {async_block#4} at /w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/bootstrap.rs:253:22 63: 0x557345b0fd83 - {closure#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/task/local.rs:978:42 64: 0x557345b0fd83 - {closure#0}<core::task::poll::Poll<core::result::Result<(), anyhow::Error>>, tokio::task::local::{impl#10}::poll::{closure_env#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/task/local.rs:730:13 65: 0x557345b0fd83 - try_with<tokio::task::local::LocalData, tokio::task::local::{impl#4}::with::{closure_env#0}<core::task::poll::Poll<core::result::Result<(), anyhow::Error>>, tokio::task::local::{impl#10}::poll::{closure_env#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>>, core::task::poll::Poll<core::result::Result<(), anyhow::Error>>> at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/local.rs:270:16 66: 0x557345b0fd83 - with<tokio::task::local::LocalData, tokio::task::local::{impl#4}::with::{closure_env#0}<core::task::poll::Poll<core::result::Result<(), anyhow::Error>>, tokio::task::local::{impl#10}::poll::{closure_env#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>>, core::task::poll::Poll<core::result::Result<(), anyhow::Error>>> at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/local.rs:246:9 67: 0x557345b0fd83 - with<core::task::poll::Poll<core::result::Result<(), anyhow::Error>>, tokio::task::local::{impl#10}::poll::{closure_env#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/task/local.rs:728:17 68: 0x557345b0fd83 - poll<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/task/local.rs:968:9 69: 0x557345b0fd83 - {async_fn#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/task/local.rs:635:19 70: 0x557345b0fd83 - {async_fn#0} at /w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/bootstrap.rs:368:30 71: 0x557345bb1405 - {async_fn#0} at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/bootstrap.rs:71:49 72: 0x557345bb1405 - {async_fn#0} at /w/hyperqueue/hyperqueue/crates/hyperqueue/src/client/commands/server.rs:159:43 73: 0x557345bb1405 - {async_fn#0} at /w/hyperqueue/hyperqueue/crates/hyperqueue/src/client/commands/server.rs:115:69 74: 0x557345bb1405 - {async_block#0} at /w/hyperqueue/hyperqueue/crates/hyperqueue/src/bin/hq.rs:386:70 75: 0x557345b9f83d - poll<&mut hq::main::{async_block_env#0}> at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/future/future.rs:124:9 76: 0x557345b9f83d - {closure#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/current_thread/mod.rs:659:57 77: 0x557345b9f83d - with_budget<core::task::poll::Poll<core::result::Result<(), hyperqueue::common::error::HqError>>, tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure#0}::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/coop.rs:107:5 78: 0x557345b9f83d - budget<core::task::poll::Poll<core::result::Result<(), hyperqueue::common::error::HqError>>, tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure#0}::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/coop.rs:73:5 79: 0x557345b9f83d - {closure#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/current_thread/mod.rs:659:25 80: 0x557345b9f83d - enter<core::task::poll::Poll<core::result::Result<(), hyperqueue::common::error::HqError>>, tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/current_thread/mod.rs:404:19 81: 0x557345b9f83d - {closure#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/current_thread/mod.rs:658:36 82: 0x557345b9f83d - {closure#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/current_thread/mod.rs:737:68 83: 0x557345b9f83d - set<tokio::runtime::scheduler::Context, tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>, (alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>)> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/context/scoped.rs:40:9 84: 0x557345b9f83d - {closure#0}<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/context.rs:176:26 85: 0x557345b9f83d - try_with<tokio::runtime::context::Context, tokio::runtime::context::set_scheduler::{closure_env#0}<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>>, (alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>)> at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/local.rs:270:16 86: 0x557345b9f83d - with<tokio::runtime::context::Context, tokio::runtime::context::set_scheduler::{closure_env#0}<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>>, (alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>)> at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/local.rs:246:9 87: 0x557345b9f83d - set_scheduler<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/context.rs:176:17 88: 0x557345b9f83d - enter<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/current_thread/mod.rs:737:27 89: 0x557345b9f83d - block_on<core::pin::Pin<&mut hq::main::{async_block_env#0}>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/current_thread/mod.rs:646:19 90: 0x557345b9f83d - {closure#0}<hq::main::{async_block_env#0}> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/current_thread/mod.rs:175:28 91: 0x557345b9f83d - enter_runtime<tokio::runtime::scheduler::current_thread::{impl#0}::block_on::{closure_env#0}<hq::main::{async_block_env#0}>, core::result::Result<(), hyperqueue::common::error::HqError>> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/context/runtime.rs:65:16 92: 0x557345b9f83d - block_on<hq::main::{async_block_env#0}> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/scheduler/current_thread/mod.rs:167:9 93: 0x557345b9f83d - block_on<hq::main::{async_block_env#0}> at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.36.0/src/runtime/runtime.rs:348:47 94: 0x557345b9f83d - main at /w/hyperqueue/hyperqueue/crates/hyperqueue/src/bin/hq.rs:456:5 95: 0x557345b2b203 - call_once<fn() -> core::result::Result<(), hyperqueue::common::error::HqError>, ()> at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ops/function.rs:250:5 96: 0x557345b2b203 - rust_begin_short_backtrace<fn() -> core::result::Result<(), hyperqueue::common::error::HqError>, core::result::Result<(), hyperqueue::common::error::HqError>> at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:155:18 97: 0x557345bba500 - main 98: 0x154fcfff624a - libc_start_call_main at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16 99: 0x154fcfff6305 - libc_start_main_impl at ./csu/../csu/libc-start.c:360:3 100: 0x557345ad4049 - 101: 0x0 - Oops, HyperQueue has crashed. This is a bug, sorry for that. If you would be so kind, please report this issue at the HQ issue tracker: https://github.com/It4innovations/hyperqueue/issues/new?title=HQ%20crashes Please include the above error (starting from "thread ... panicked ...") and the stack backtrace in the issue contents, along with the following information:

HyperQueue version: v0.19.0

You can also re-run HyperQueue server (and its workers) with the RUST_LOG=hq=debug,tako=debug environment variable, and attach the logs to the issue, to provide us more information.

Kobzol commented 3 weeks ago

Oops, that looks like some race condition, we will take a look.

If you can reproduce the error, could you please run the server with the following environment variable: RUST_LOG=hq=debug,tako=debug hq server start and then sends us the full debug log if it crashes again? It would help us to debug it.

It would be also great to know how do you create workers (manually/autoalloc?) and what hq submit commands are you using.

Kobzol commented 1 day ago

@chkabir Were you able to reproduce the issue and/or run HQ with more logging? :)