It4innovations / hyperqueue

Scheduler for sub-node tasks for HPC systems with batch scheduling
https://it4innovations.github.io/hyperqueue
MIT License
266 stars 20 forks source link

HQ crashes #697

Closed unkcpz closed 2 months ago

unkcpz commented 2 months ago

First, thanks for the nice tool! We used in with aiida-hyperqueue.\

I am not sure if it is proper to paste the whole trace of error in the issue, I follow the instruction from the error message. Let me know if I need provide more for debugging.

HyperQueue version: v0.18.0

thread 'main' panicked at 'Invalid response: Error(ErrorResponse { message: "Invalid task request GenericError(\"Request are not sorted or unique\")" })', /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/client/submit.rs:111:14                                                
stack backtrace:                                                                                                                                                                                                                                                                          
   0:     0x55ea32156434 - std::backtrace_rs::backtrace::libunwind::trace::he648b5c8dd376705                                                                                                                                                                                              
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5                                                                                                                                         
   1:     0x55ea32156434 - std::backtrace_rs::backtrace::trace_unsynchronized::h5da3e203eef39e9f                                                                                                                                                                                          
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5                                                                                                                                               
   2:     0x55ea32156434 - std::sys_common::backtrace::_print_fmt::h8d28d3f20588ae4c                                                                                                                                                                                                      
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:65:5                                                                                                                                                            
   3:     0x55ea32156434 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hd9a5b0c9c6b058c0                                                                                                                                                           
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:44:22                                                                                                                                                           
   4:     0x55ea31deafaf - core::fmt::rt::Argument::fmt::h0afc04119f252b53                                                                                                                                                                                                                
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/fmt/rt.rs:138:9                                                                                                                                                                        
   5:     0x55ea31deafaf - core::fmt::write::h50b1b3e73851a6fe                                                                                                                                                                                                                            
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/fmt/mod.rs:1094:21                                                                                                                                                                     
   6:     0x55ea3211ca16 - std::io::Write::write_fmt::h184eaf275e4484f0                                                                                                                                                                                                                   
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/io/mod.rs:1714:15                                                                                                                                                                       
   7:     0x55ea3215831f - std::sys_common::backtrace::_print::hf58c3a5a25090e71                                                                                                                                                                                                          
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:47:5                                                                                                                                                            
   8:     0x55ea3215831f - std::sys_common::backtrace::print::hb9cf0a7c7f077819                                                                                                                                                                                                           
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:34:9                                                                                                                                                            
   9:     0x55ea32157ec4 - std::panicking::default_hook::{{closure}}::h066adb2e3f3e2c07                                                                                                                                                                                                   
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:269:22                                                                                                                                                                     
  10:     0x55ea32157bf3 - std::panicking::default_hook::h277fa2776900ff14                                                                                                                                                                                                                
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:288:9                                                                                                                                                                      
  11:     0x55ea31d4c787 - call<(&core::panic::panic_info::PanicInfo), (dyn core::ops::function::Fn<(&core::panic::panic_info::PanicInfo), Output=()> + core::marker::Send + core::marker::Sync), alloc::alloc::Global>                                                                   
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/boxed.rs:2007:9                                                                                                                                                                       
  12:     0x55ea31d4c787 - {closure#0}                                                                                                                                                                                                                                                    
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/bin/hq.rs:360:9                                                                                                                                                                                        
  13:     0x55ea32158c58 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h09cad52ea08435f2                                                                                                                                                                             
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/boxed.rs:2007:9                                                                                                                                                                       
  14:     0x55ea32158c58 - std::panicking::rust_panic_with_hook::hceaf38da6d9db792                                                                                                                                                                                                        
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:709:13                                                                                                                                                                     
  15:     0x55ea321589e2 - std::panicking::begin_panic_handler::{{closure}}::h2bce3ed2516af7df                                 
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:597:13
  16:     0x55ea32158946 - std::sys_common::backtrace::__rust_end_short_backtrace::h090f3faf8f98a395              
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:151:18
  17:     0x55ea32158931 - rust_begin_unwind                                                                                                 
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5
  18:     0x55ea31c29d12 - core::panicking::panic_fmt::h4ec8274704d163a3                                           
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:67:14                                                                                                                                                                     
  19:     0x55ea31c9d632 - {async_fn#0}                                                                                                                                                                                                                                                   
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/client/submit.rs:111:14                                                                                                                                                                         
  20:     0x55ea31c9d632 - {async_fn#0}<futures_util::sink::with::With<futures_util::stream::stream::split::SplitSink<tokio_util::codec::framed::Framed<tokio::net::tcp::stream::TcpStream, tokio_util::codec::length_delimited::LengthDelimitedCodec>, bytes::bytes::Bytes>, bytes::bytes
::Bytes, hyperqueue::transfer::messages::ToClientMessage, futures_util::future::ready::Ready<core::result::Result<bytes::bytes::Bytes, hyperqueue::common::error::HqError>>, hyperqueue::transfer::connection::{impl#0}::split::{closure_env#0}<hyperqueue::transfer::messages::FromClient
Message, hyperqueue::transfer::messages::ToClientMessage>>, futures_util::stream::stream::map::Map<futures_util::stream::stream::split::SplitStream<tokio_util::codec::framed::Framed<tokio::net::tcp::stream::TcpStream, tokio_util::codec::length_delimited::LengthDelimitedCodec>>, hyp
erqueue::transfer::connection::{impl#0}::split::{closure_env#1}<hyperqueue::transfer::messages::FromClientMessage, hyperqueue::transfer::messages::ToClientMessage>>>                                                                                                                     
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/client/mod.rs:92:75                                                                                                                                                                             
  21:     0x55ea31d21f55 - {async_fn#0}                                                                                                                                                                                                                                                   
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/client/mod.rs:69:72                                                                                                                                                                             
  22:     0x55ea31d21f55 - {async_block#0}                                                                                                                                                                                                                                                
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/client/mod.rs:49:91                                                                                                                                                                             
  23:     0x55ea31d21f55 - {closure#0}<hyperqueue::server::client::handle_client_connections::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>                                                                                                            
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/core.rs:328:17                                                                                                                                          
  24:     0x55ea31d21f55 - with_mut<tokio::runtime::task::core::Stage<hyperqueue::server::client::handle_client_connections::{async_fn#0}::{async_block_env#0}>, core::task::poll::Poll<()>, tokio::runtime::task::core::{impl#6}::poll::{closure_env#0}<hyperqueue::server::client::handl
e_client_connections::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>>                                                                                                                                                                                   
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/loom/std/unsafe_cell.rs:16:9                                                                                                                                         
  25:     0x55ea31d21f55 - poll<hyperqueue::server::client::handle_client_connections::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>                                                                                                                   
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/core.rs:317:30
  26:     0x55ea31d21f55 - {closure#0}<hyperqueue::server::client::handle_client_connections::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/harness.rs:485:19
  27:     0x55ea31d21f55 - call_once<core::task::poll::Poll<()>, tokio::runtime::task::harness::poll_future::{closure_env#0}<hyperqueue::server::client::handle_client_connections::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panic/unwind_safe.rs:271:9                
  28:     0x55ea31d21f55 - do_call<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<hyperqueue::server::client::handle_client_connections::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>>, core::
task::poll::Poll<()>>                                                                                                                        
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:500:40
  29:     0x55ea31d21f55 - try<core::task::poll::Poll<()>, core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<hyperqueue::server::client::handle_client_connections::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::
local::Shared>>>>                                                     
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:464:19
  30:     0x55ea31d21f55 - catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<hyperqueue::server::client::handle_client_connections::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>>, c
ore::task::poll::Poll<()>>                                            
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panic.rs:142:14
  31:     0x55ea31d21f55 - poll_future<hyperqueue::server::client::handle_client_connections::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/harness.rs:473:18
  32:     0x55ea31d21f55 - poll_inner<hyperqueue::server::client::handle_client_connections::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/harness.rs:208:27
  33:     0x55ea31d21f55 - poll<hyperqueue::server::client::handle_client_connections::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/harness.rs:153:15
  34:     0x55ea31d21f55 - poll<hyperqueue::server::client::handle_client_connections::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/raw.rs:276:5
  35:     0x55ea321c31a2 - poll                                       
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/raw.rs:200:18
  36:     0x55ea321c31a2 - run<alloc::sync::Arc<tokio::task::local::Shared>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/task/mod.rs:408:9
  37:     0x55ea321c31a2 - {closure#0}
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/task/local.rs:618:68
  38:     0x55ea321c31a2 - with_budget<(), tokio::task::local::{impl#2}::tick::{closure_env#0}>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/coop.rs:107:5
  39:     0x55ea321c31a2 - budget<(), tokio::task::local::{impl#2}::tick::{closure_env#0}>                                                                                                                                                                                        [75/823]
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/coop.rs:73:5
  40:     0x55ea321c31a2 - tick                                       
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/task/local.rs:618:31
  41:     0x55ea31cf1ee6 - {closure#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/task/local.rs:954:16
  42:     0x55ea31cf1ee6 - {closure#0}<core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>, tokio::task::local::{impl#8}::poll::{closure_env#0}<tokio::net::tcp::listener::{impl#0}::accept::{as
ync_fn_env#0}>>                                                       
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/task/local.rs:687:13
  43:     0x55ea31cf1ee6 - try_with<tokio::task::local::LocalData, tokio::task::local::{impl#2}::with::{closure_env#0}<core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>, tokio::task::local::
{impl#8}::poll::{closure_env#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>>, core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:270:16
  44:     0x55ea31cf1ee6 - with<tokio::task::local::LocalData, tokio::task::local::{impl#2}::with::{closure_env#0}<core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>, tokio::task::local::{imp
l#8}::poll::{closure_env#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>>, core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:246:9
  45:     0x55ea31cf1ee6 - with<core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>, tokio::task::local::{impl#8}::poll::{closure_env#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_
env#0}>>                                                              
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/task/local.rs:670:17
  46:     0x55ea31cf1ee6 - poll<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/task/local.rs:940:9
  47:     0x55ea31cf1ee6 - {async_fn#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/task/local.rs:575:19
  48:     0x55ea31cf1ee6 - {async_fn#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}, core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>
                               at /__w/hyperqueue/hyperqueue/crates/tako/src/internal/common/taskgroup.rs:15:36
  49:     0x55ea31c7e317 - {async_fn#0}
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/client/mod.rs:40:72
  50:     0x55ea31c7e317 - {closure#0}
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/macros/select.rs:524:49
  51:     0x55ea31c7e317 - poll<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block#4}::__tokio_select_util::Out<(), (), (), core::result::Result<(), hyperqueue::common::error::HqError>>, hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{asyn
c_block#4}::{closure_env#0}>                                          
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/future/poll_fn.rs:58:9
  52:     0x55ea31c7e317 - {async_block#4}
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/bootstrap.rs:230:22
  53:     0x55ea31c7e317 - {closure#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/task/local.rs:950:42
  54:     0x55ea31c7e317 - {closure#0}<core::task::poll::Poll<core::result::Result<(), anyhow::Error>>, tokio::task::local::{impl#8}::poll::{closure_env#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/task/local.rs:687:13
  55:     0x55ea31c7e317 - try_with<tokio::task::local::LocalData, tokio::task::local::{impl#2}::with::{closure_env#0}<core::task::poll::Poll<core::result::Result<(), anyhow::Error>>, tokio::task::local::{impl#8}::poll::{closure_env#0}<hyperqueue::server::bootstrap::initialize_serv
er::{async_fn#0}::{async_block_env#4}>>, core::task::poll::Poll<core::result::Result<(), anyhow::Error>>>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:270:16
  56:     0x55ea31c7e317 - with<tokio::task::local::LocalData, tokio::task::local::{impl#2}::with::{closure_env#0}<core::task::poll::Poll<core::result::Result<(), anyhow::Error>>, tokio::task::local::{impl#8}::poll::{closure_env#0}<hyperqueue::server::bootstrap::initialize_server::
{async_fn#0}::{async_block_env#4}>>, core::task::poll::Poll<core::result::Result<(), anyhow::Error>>>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:246:9
  57:     0x55ea31c7e317 - with<core::task::poll::Poll<core::result::Result<(), anyhow::Error>>, tokio::task::local::{impl#8}::poll::{closure_env#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/task/local.rs:670:17
  58:     0x55ea31c7e317 - poll<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/task/local.rs:940:9
  59:     0x55ea31c7e317 - {async_fn#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/task/local.rs:575:19
  60:     0x55ea31c7e317 - {async_fn#0}
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/bootstrap.rs:292:30
  61:     0x55ea31c7e317 - {async_fn#0}
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/bootstrap.rs:68:49
  62:     0x55ea31c7e317 - {async_fn#0}
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/client/commands/server.rs:159:43
  63:     0x55ea31c7e317 - {async_fn#0}
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/client/commands/server.rs:115:69
  64:     0x55ea31d3f6bf - {async_block#0}
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/bin/hq.rs:378:70
  65:     0x55ea31d35670 - poll<&mut hq::main::{async_block_env#0}>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/future/future.rs:125:9
  66:     0x55ea31d35670 - {closure#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:665:57
  67:     0x55ea31d35670 - with_budget<core::task::poll::Poll<core::result::Result<(), hyperqueue::common::error::HqError>>, tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure#0}::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/coop.rs:107:5
  68:     0x55ea31d35670 - budget<core::task::poll::Poll<core::result::Result<(), hyperqueue::common::error::HqError>>, tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure#0}::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/coop.rs:73:5
  69:     0x55ea31d35670 - {closure#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:665:25
  70:     0x55ea31d35670 - enter<core::task::poll::Poll<core::result::Result<(), hyperqueue::common::error::HqError>>, tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:410:19
  71:     0x55ea31d35670 - {closure#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:664:36
  72:     0x55ea31d35670 - {closure#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:743:68
  73:     0x55ea31d35670 - set<tokio::runtime::scheduler::Context, tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::o
ption::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>, (alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>)>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/context/scoped.rs:40:9
  74:     0x55ea31d35670 - {closure#0}<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closur
e_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/context.rs:176:26
  75:     0x55ea31d35670 - try_with<tokio::runtime::context::Context, tokio::runtime::context::set_scheduler::{closure_env#0}<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common:
:error::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperque
ue::common::error::HqError>>>>, (alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>)>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:270:16
  76:     0x55ea31d35670 - with<tokio::runtime::context::Context, tokio::runtime::context::set_scheduler::{closure_env#0}<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::err
or::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::
common::error::HqError>>>>, (alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>)>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:246:9
  77:     0x55ea31d35670 - set_scheduler<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{clos
ure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/context.rs:176:17
  78:     0x55ea31d35670 - enter<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:743:27
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/bootstrap.rs:292:30
  61:     0x55ea31c7e317 - {async_fn#0}
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/bootstrap.rs:68:49
  62:     0x55ea31c7e317 - {async_fn#0}
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/client/commands/server.rs:159:43
  63:     0x55ea31c7e317 - {async_fn#0}
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/client/commands/server.rs:115:69
  64:     0x55ea31d3f6bf - {async_block#0}
                               at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/bin/hq.rs:378:70
  65:     0x55ea31d35670 - poll<&mut hq::main::{async_block_env#0}>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/future/future.rs:125:9
  66:     0x55ea31d35670 - {closure#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:665:57
  67:     0x55ea31d35670 - with_budget<core::task::poll::Poll<core::result::Result<(), hyperqueue::common::error::HqError>>, tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure#0}::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/coop.rs:107:5
  68:     0x55ea31d35670 - budget<core::task::poll::Poll<core::result::Result<(), hyperqueue::common::error::HqError>>, tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure#0}::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/coop.rs:73:5
  69:     0x55ea31d35670 - {closure#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:665:25
  70:     0x55ea31d35670 - enter<core::task::poll::Poll<core::result::Result<(), hyperqueue::common::error::HqError>>, tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:410:19
  71:     0x55ea31d35670 - {closure#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:664:36
  72:     0x55ea31d35670 - {closure#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:743:68
  73:     0x55ea31d35670 - set<tokio::runtime::scheduler::Context, tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::o
ption::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>, (alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>)>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/context/scoped.rs:40:9
  74:     0x55ea31d35670 - {closure#0}<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closur
e_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/context.rs:176:26
  75:     0x55ea31d35670 - try_with<tokio::runtime::context::Context, tokio::runtime::context::set_scheduler::{closure_env#0}<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common:
:error::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperque
ue::common::error::HqError>>>>, (alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>)>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:270:16
  76:     0x55ea31d35670 - with<tokio::runtime::context::Context, tokio::runtime::context::set_scheduler::{closure_env#0}<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::err
or::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::
common::error::HqError>>>>, (alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>)>
                               at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:246:9
  77:     0x55ea31d35670 - set_scheduler<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{clos
ure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/context.rs:176:17
  78:     0x55ea31d35670 - enter<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:743:27
  79:     0x55ea31d35670 - block_on<core::pin::Pin<&mut hq::main::{async_block_env#0}>>
                               at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.33.0/src/runtime/scheduler/current_thread/mod.rs:652:19
Kobzol commented 2 months ago

Hi, thanks for your report! I have reproduced it locally. It happens when you request the same resource multiple times, e.g. this also crashes:

hq submit --resource foo=1 --resource foo=2 ls

We will fix the input validation so that it doesn't crash so late and produces a better error message (and of course to also avoid crashing the server :laughing:), but in any case, this is unsupported (the above should just be --resource foo=3).

What submit command did you use? Did you use the CLI, the TOML file or Python API for submitting the job?

unkcpz commented 2 months ago

I use CLI to submit the job. The command I ran is:

hq job submit --name="aiida-1638" --stdout=_scheduler-stdout.txt --stderr=_scheduler-stderr.txt --time-request=3600s --time-limit=3600s --cpus=32 --resource mem=120000 ./_aiidasubmit.sh

I start the server and have an auto-alloc

The script content is:

#!/bin/bash 

module load cray/22.05  cpeIntel/22.05
module load QuantumESPRESSO/7.0

export OMP_NUM_THREADS=1

srun --cpu-bind=map_cpu:$HQ_CPUS '-s' '-n' '32' '--mem' '120000' '/capstor/apps/cscs/eiger/easybuild/software/QuantumESPRESSO/7.0-cpeIntel-22.05/bin/pw.x' '-npool' '4' '-in' 'aiida.in'  > 'aiida.out'
unkcpz commented 2 months ago

The initial crash is coming from the worker is not accessible since it uses the hostname to connect to the login node but the supercomputer center has <hostname>.<domain> to connect. Then it is the issue above. I didn't submit the job twice.