Closed grzanka closed 5 months ago
Hi, thanks a lot for sending the crash report! We'll try to take a look at this ASAP.
Could you please also send us more information about how you were using HQ? E.g. which kinds of tasks did you submit (did you use the hq submit
commands or Python API?), how did you start the workers (manually or through autoallocation) etc.? Thank you!
I've submitted the jobs using python API:
from hyperqueue.task.function import PythonEnv
from hyperqueue import Client, Job
from pathlib import Path
from convert_from_lv1_to_lv2 import convert, summarize
from paths import project_path
hq_server_path = Path.home() / ".hq-server/hq-current"
dir_to_scan = Path('/net/ascratch/people/plgkongruencj/lgad/datarawlv1v3/')
output_name = 'lv2_v3_test4'
output_dir = Path(f'/net/ascratch/people/plgkongruencj/lgad/dataraw{output_name}/')
env = PythonEnv(prologue="ml python && source $SCRATCH/hq/venv/bin/activate", )
client = Client(hq_server_path, python_env=env)
hdf_files = list(dir_to_scan.rglob('*.hdf'))
hdf_files.sort(key=lambda x: x.stat().st_size)
# limit to smaller number of files
# hdf_files = hdf_files[:10]
job = Job()
convert_tasks = []
for i, file in enumerate(hdf_files):
print(f"Submitting file: {file} with size {file.stat().st_size / (1024**3):.2f} GB")
priority = len(hdf_files) - i
task = job.function(convert,
kwargs={
'input_path': file,
'output_dir': output_dir,
'nsigma': 4.0
},
priority=priority)
convert_tasks.append(task)
generate_index = job.function(summarize,
kwargs={
'output_dir': output_dir,
'index_template_path': project_path / 'data/raw/2023/template_lv2.html',
'bucket_name': 'datarawlv2v3',
'title': 'Data level 2 version 3',
'description': 'Data level 2 version 3 for LGAD test beam.'
},
deps=convert_tasks)
submitted = client.submit(job)
The allocations were added in a following way:
#!/usr/bin/bash
# Enable automatic allocation (create queue)
hq alloc add slurm \
--time-limit 2h \
--workers-per-alloc 1 \
--max-worker-count 36 \
--backlog 36 \
--idle-timeout 5m \
-- \
--partition=plgrid
# Enable automatic allocation (create queue)
hq alloc add slurm \
--name short \
--time-limit 59min \
--workers-per-alloc 1 \
--max-worker-count 3 \
--backlog 3 \
--idle-timeout 15m \
-- \
--partition=plgrid-testing \
--constraint=memfs
hq alloc add slurm \
--name now \
--time-limit 3h \
--workers-per-alloc 1 \
--max-worker-count 1 \
--backlog 1 \
--idle-timeout 15m \
-- \
--partition=plgrid-now
The crash happened after few hours of stable run. Right now I'm running hq server again with more verbosity to see if it happens again.
I got another crash:
[2024-02-07T16:59:59.517Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=22
[2024-02-07T16:59:59.527Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=26
[2024-02-07T16:59:59.527Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=25
[2024-02-07T16:59:59.527Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=23
[2024-02-07T16:59:59.527Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=24
[2024-02-07T16:59:59.528Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=28
[2024-02-07T16:59:59.528Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=27
[2024-02-07T16:59:59.529Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=29
[2024-02-07T16:59:59.529Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=30
[2024-02-07T16:59:59.530Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=31
[2024-02-07T16:59:59.531Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=32
[2024-02-07T16:59:59.532Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=34
[2024-02-07T16:59:59.532Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=33
[2024-02-07T16:59:59.532Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=35
[2024-02-07T16:59:59.534Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=36
[2024-02-07T16:59:59.535Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=37
[2024-02-07T16:59:59.535Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=38
[2024-02-07T17:00:04.065Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=1
[2024-02-07T17:00:04.077Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=2
[2024-02-07T17:00:07.406Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=4
[2024-02-07T17:00:07.407Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=3
[2024-02-07T17:00:07.453Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=5
[2024-02-07T17:00:07.453Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=6
[2024-02-07T17:00:07.464Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=9
[2024-02-07T17:00:07.464Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=7
[2024-02-07T17:00:07.465Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=8
[2024-02-07T17:00:07.475Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=10
[2024-02-07T17:00:07.475Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=11
[2024-02-07T17:00:07.475Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=13
[2024-02-07T17:00:07.476Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=12
[2024-02-07T17:00:07.476Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=15
[2024-02-07T17:00:07.476Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=14
[2024-02-07T17:00:07.476Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=16
[2024-02-07T17:00:07.478Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=17
[2024-02-07T17:00:07.479Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=18
[2024-02-07T17:00:07.500Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=19
[2024-02-07T17:00:07.510Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=20
[2024-02-07T17:00:07.516Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=21
[2024-02-07T17:00:07.518Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=22
[2024-02-07T17:00:07.527Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=25
[2024-02-07T17:00:07.527Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=24
[2024-02-07T17:00:07.527Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=23
[2024-02-07T17:00:07.528Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=27
[2024-02-07T17:00:07.528Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=26
[2024-02-07T17:00:07.529Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=30
[2024-02-07T17:00:07.529Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=28
[2024-02-07T17:00:07.530Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=29
[2024-02-07T17:00:07.531Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=31
[2024-02-07T17:00:07.531Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=32
[2024-02-07T17:00:07.532Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=34
[2024-02-07T17:00:07.532Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=33
[2024-02-07T17:00:07.533Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=35
[2024-02-07T17:00:07.534Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=36
[2024-02-07T17:00:07.535Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=37
[2024-02-07T17:00:07.536Z DEBUG tako::internal::server::rpc] Heartbeat received, worker=38
[2024-02-07T17:00:08.076Z DEBUG tako::internal::server::rpc] Receive loop terminated (Ok(Some(TimeLimitReached))), worker=2
[2024-02-07T17:00:08.076Z INFO tako::internal::server::rpc] Worker 2 connection closed (connection: 172.22.19.20:43100)
[2024-02-07T17:00:08.076Z DEBUG tako::internal::server::reactor] Removing worker 2
thread 'main' panicked at 'not yet implemented', /__w/hyperqueue/hyperqueue/crates/tako/src/internal/server/reactor.rs:91:21
stack backtrace:
0: 0x55fb60fd4b14 - std::backtrace_rs::backtrace::libunwind::trace::he648b5c8dd376705
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
1: 0x55fb60fd4b14 - std::backtrace_rs::backtrace::trace_unsynchronized::h5da3e203eef39e9f
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x55fb60fd4b14 - std::sys_common::backtrace::_print_fmt::h8d28d3f20588ae4c
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:65:5
3: 0x55fb60fd4b14 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hd9a5b0c9c6b058c0
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:44:22
4: 0x55fb60cb8b1f - core::fmt::rt::Argument::fmt::h0afc04119f252b53
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/fmt/rt.rs:138:9
5: 0x55fb60cb8b1f - core::fmt::write::h50b1b3e73851a6fe
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/fmt/mod.rs:1094:21
6: 0x55fb60f9b5f6 - std::io::Write::write_fmt::h184eaf275e4484f0
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/io/mod.rs:1714:15
7: 0x55fb60fd665f - std::sys_common::backtrace::_print::hf58c3a5a25090e71
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:47:5
8: 0x55fb60fd665f - std::sys_common::backtrace::print::hb9cf0a7c7f077819
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:34:9
9: 0x55fb60fd6204 - std::panicking::default_hook::{{closure}}::h066adb2e3f3e2c07
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:269:22
10: 0x55fb60fd5f33 - std::panicking::default_hook::h277fa2776900ff14
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:288:9
11: 0x55fb60c1e0c7 - call<(&core::panic::panic_info::PanicInfo), (dyn core::ops::function::Fn<(&core::panic::panic_info::PanicInfo), Output=()> + core::marker::Send + core::marker::Sync), alloc::alloc::Global>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/boxed.rs:2007:9
12: 0x55fb60c1e0c7 - {closure#0}
at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/bin/hq.rs:360:9
13: 0x55fb60fd6f98 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h09cad52ea08435f2
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/alloc/src/boxed.rs:2007:9
14: 0x55fb60fd6f98 - std::panicking::rust_panic_with_hook::hceaf38da6d9db792
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:709:13
15: 0x55fb60fd6cf3 - std::panicking::begin_panic_handler::{{closure}}::h2bce3ed2516af7df
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:595:13
16: 0x55fb60fd6c86 - std::sys_common::backtrace::__rust_end_short_backtrace::h090f3faf8f98a395
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:151:18
17: 0x55fb60fd6c71 - rust_begin_unwind
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5
18: 0x55fb60b0d762 - core::panicking::panic_fmt::h4ec8274704d163a3
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:67:14
19: 0x55fb60b0d932 - core::panicking::panic::hee69a8315e4031d6
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:117:5
20: 0x55fb60bcf714 - on_remove_worker<tako::internal::server::comm::CommSender>
at /__w/hyperqueue/hyperqueue/crates/tako/src/internal/server/reactor.rs:91:21
21: 0x55fb60bcf714 - {async_fn#0}
at /__w/hyperqueue/hyperqueue/crates/tako/src/internal/server/rpc.rs:251:5
22: 0x55fb60bf59d7 - {async_block#0}
at /__w/hyperqueue/hyperqueue/crates/tako/src/internal/server/rpc.rs:64:83
23: 0x55fb60bf59d7 - {closure#0}<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/task/core.rs:334:17
24: 0x55fb60bf59d7 - with_mut<tokio::runtime::task::core::Stage<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}>, core::task::poll::Poll<()>, tokio::runtime::task::core::{impl#6}::poll::{closure_env#0}<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/loom/std/unsafe_cell.rs:16:9
25: 0x55fb60bf59d7 - poll<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/task/core.rs:323:30
26: 0x55fb60bf59d7 - {closure#0}<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/task/harness.rs:485:19
27: 0x55fb60bf59d7 - call_once<core::task::poll::Poll<()>, tokio::runtime::task::harness::poll_future::{closure_env#0}<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panic/unwind_safe.rs:271:9
28: 0x55fb60bf59d7 - do_call<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>>, core::task::poll::Poll<()>>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:500:40
29: 0x55fb60bf59d7 - try<core::task::poll::Poll<()>, core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>>>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:464:19
30: 0x55fb60bf59d7 - catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>>, core::task::poll::Poll<()>>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panic.rs:142:14
31: 0x55fb60bf59d7 - poll_future<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/task/harness.rs:473:18
32: 0x55fb60bf59d7 - poll_inner<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/task/harness.rs:208:27
33: 0x55fb60bf59d7 - poll<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/task/harness.rs:153:15
34: 0x55fb60bf59d7 - poll<tako::internal::server::rpc::connection_initiator::{async_fn#0}::{async_block_env#0}, alloc::sync::Arc<tokio::task::local::Shared>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/task/raw.rs:276:5
35: 0x55fb610429f2 - poll
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/task/raw.rs:200:18
36: 0x55fb610429f2 - run<alloc::sync::Arc<tokio::task::local::Shared>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/task/mod.rs:400:9
37: 0x55fb610429f2 - {closure#0}
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/task/local.rs:617:68
38: 0x55fb610429f2 - with_budget<(), tokio::task::local::{impl#2}::tick::{closure_env#0}>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/coop.rs:107:5
39: 0x55fb610429f2 - budget<(), tokio::task::local::{impl#2}::tick::{closure_env#0}>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/coop.rs:73:5
40: 0x55fb610429f2 - tick
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/task/local.rs:617:31
41: 0x55fb60bc6786 - {closure#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/task/local.rs:953:16
42: 0x55fb60bc6786 - {closure#0}<core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>, tokio::task::local::{impl#8}::poll::{closure_env#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/task/local.rs:686:13
43: 0x55fb60bc6786 - try_with<tokio::task::local::LocalData, tokio::task::local::{impl#2}::with::{closure_env#0}<core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>, tokio::task::local::{impl#8}::poll::{closure_env#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>>, core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:270:16
44: 0x55fb60bc6786 - with<tokio::task::local::LocalData, tokio::task::local::{impl#2}::with::{closure_env#0}<core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>, tokio::task::local::{impl#8}::poll::{closure_env#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>>, core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:246:9
45: 0x55fb60bc6786 - with<core::task::poll::Poll<core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>, tokio::task::local::{impl#8}::poll::{closure_env#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/task/local.rs:669:17
46: 0x55fb60bc6786 - poll<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/task/local.rs:939:9
47: 0x55fb60bc6786 - {async_fn#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/task/local.rs:575:19
48: 0x55fb60bc6786 - {async_fn#0}<tokio::net::tcp::listener::{impl#0}::accept::{async_fn_env#0}, core::result::Result<(tokio::net::tcp::stream::TcpStream, core::net::socket_addr::SocketAddr), std::io::error::Error>>
at /__w/hyperqueue/hyperqueue/crates/tako/src/internal/common/taskgroup.rs:15:36
49: 0x55fb60b5f9d9 - {async_fn#0}
at /__w/hyperqueue/hyperqueue/crates/tako/src/internal/server/rpc.rs:48:68
50: 0x55fb60b5f9d9 - {closure#0}
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/macros/select.rs:524:49
51: 0x55fb60b5f9d9 - poll<tako::internal::server::start::server_start::{async_fn#0}::{async_block#0}::__tokio_select_util::Out<(), core::result::Result<(), tako::internal::common::error::DsError>>, tako::internal::server::start::server_start::{async_fn#0}::{async_block#0}::{closure_env#0}>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/future/poll_fn.rs:58:9
52: 0x55fb60b5f9d9 - {async_block#0}
at /__w/hyperqueue/hyperqueue/crates/tako/src/internal/server/start.rs:89:9
53: 0x55fb60b5f9d9 - {closure#2}
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/macros/select.rs:524:49
54: 0x55fb60b57b66 - poll<hyperqueue::server::rpc::{impl#0}::start::{async_fn#0}::{async_block#1}::__tokio_select_util::Out<core::result::Result<(), hyperqueue::common::error::HqError>, core::result::Result<(), hyperqueue::common::error::HqError>, core::result::Result<(), tako::internal::common::error::DsError>>, hyperqueue::server::rpc::{impl#0}::start::{async_fn#0}::{async_block#1}::{closure_env#2}>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/future/poll_fn.rs:58:9
55: 0x55fb60b57b66 - {async_block#1}
at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/rpc.rs:151:13
56: 0x55fb60b57b66 - {closure#0}
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/macros/select.rs:524:49
57: 0x55fb60b57b66 - poll<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block#4}::__tokio_select_util::Out<(), (), (), core::result::Result<(), hyperqueue::common::error::HqError>>, hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block#4}::{closure_env#0}>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/future/poll_fn.rs:58:9
58: 0x55fb60b57b66 - {async_block#4}
at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/bootstrap.rs:230:22
59: 0x55fb60b57b66 - {closure#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/task/local.rs:949:42
60: 0x55fb60b57b66 - {closure#0}<core::task::poll::Poll<core::result::Result<(), anyhow::Error>>, tokio::task::local::{impl#8}::poll::{closure_env#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/task/local.rs:686:13
61: 0x55fb60b57b66 - try_with<tokio::task::local::LocalData, tokio::task::local::{impl#2}::with::{closure_env#0}<core::task::poll::Poll<core::result::Result<(), anyhow::Error>>, tokio::task::local::{impl#8}::poll::{closure_env#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>>, core::task::poll::Poll<core::result::Result<(), anyhow::Error>>>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:270:16
62: 0x55fb60b57b66 - with<tokio::task::local::LocalData, tokio::task::local::{impl#2}::with::{closure_env#0}<core::task::poll::Poll<core::result::Result<(), anyhow::Error>>, tokio::task::local::{impl#8}::poll::{closure_env#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>>, core::task::poll::Poll<core::result::Result<(), anyhow::Error>>>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:246:9
63: 0x55fb60b57b66 - with<core::task::poll::Poll<core::result::Result<(), anyhow::Error>>, tokio::task::local::{impl#8}::poll::{closure_env#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/task/local.rs:669:17
64: 0x55fb60b57b66 - poll<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/task/local.rs:939:9
65: 0x55fb60b57b66 - {async_fn#0}<hyperqueue::server::bootstrap::initialize_server::{async_fn#0}::{async_block_env#4}>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/task/local.rs:575:19
66: 0x55fb60b57b66 - {async_fn#0}
at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/bootstrap.rs:292:30
67: 0x55fb60b57b66 - {async_fn#0}
at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/server/bootstrap.rs:68:49
68: 0x55fb60b57b66 - {async_fn#0}
at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/client/commands/server.rs:159:43
69: 0x55fb60b57b66 - {async_fn#0}
at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/client/commands/server.rs:115:69
70: 0x55fb60c11208 - {async_block#0}
at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/bin/hq.rs:378:70
71: 0x55fb60c09080 - poll<&mut hq::main::{async_block_env#0}>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/future/future.rs:125:9
72: 0x55fb60c09080 - {closure#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/scheduler/current_thread/mod.rs:665:57
73: 0x55fb60c09080 - with_budget<core::task::poll::Poll<core::result::Result<(), hyperqueue::common::error::HqError>>, tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure#0}::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/coop.rs:107:5
74: 0x55fb60c09080 - budget<core::task::poll::Poll<core::result::Result<(), hyperqueue::common::error::HqError>>, tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure#0}::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/coop.rs:73:5
75: 0x55fb60c09080 - {closure#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/scheduler/current_thread/mod.rs:665:25
76: 0x55fb60c09080 - enter<core::task::poll::Poll<core::result::Result<(), hyperqueue::common::error::HqError>>, tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/scheduler/current_thread/mod.rs:410:19
77: 0x55fb60c09080 - {closure#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/scheduler/current_thread/mod.rs:664:36
78: 0x55fb60c09080 - {closure#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/scheduler/current_thread/mod.rs:743:68
79: 0x55fb60c09080 - set<tokio::runtime::scheduler::Context, tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>, (alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>)>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/context/scoped.rs:40:9
80: 0x55fb60c09080 - {closure#0}<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/context.rs:176:26
81: 0x55fb60c09080 - try_with<tokio::runtime::context::Context, tokio::runtime::context::set_scheduler::{closure_env#0}<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>>, (alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>)>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:270:16
82: 0x55fb60c09080 - with<tokio::runtime::context::Context, tokio::runtime::context::set_scheduler::{closure_env#0}<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>>, (alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>)>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/thread/local.rs:246:9
83: 0x55fb60c09080 - set_scheduler<(alloc::boxed::Box<tokio::runtime::scheduler::current_thread::Core, alloc::alloc::Global>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/context.rs:176:17
84: 0x55fb60c09080 - enter<tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure_env#0}<core::pin::Pin<&mut hq::main::{async_block_env#0}>>, core::option::Option<core::result::Result<(), hyperqueue::common::error::HqError>>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/scheduler/current_thread/mod.rs:743:27
85: 0x55fb60c09080 - block_on<core::pin::Pin<&mut hq::main::{async_block_env#0}>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/scheduler/current_thread/mod.rs:652:19
86: 0x55fb60c09080 - {closure#0}<hq::main::{async_block_env#0}>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/scheduler/current_thread/mod.rs:175:28
87: 0x55fb60c09080 - enter_runtime<tokio::runtime::scheduler::current_thread::{impl#0}::block_on::{closure_env#0}<hq::main::{async_block_env#0}>, core::result::Result<(), hyperqueue::common::error::HqError>>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/context/runtime.rs:65:16
88: 0x55fb60c09080 - block_on<hq::main::{async_block_env#0}>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/scheduler/current_thread/mod.rs:167:9
89: 0x55fb60c09080 - block_on<hq::main::{async_block_env#0}>
at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.32.0/src/runtime/runtime.rs:347:47
90: 0x55fb60c09080 - main
at /__w/hyperqueue/hyperqueue/crates/hyperqueue/src/bin/hq.rs:448:5
91: 0x55fb60ba23c3 - call_once<fn() -> core::result::Result<(), hyperqueue::common::error::HqError>, ()>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ops/function.rs:250:5
92: 0x55fb60ba23c3 - __rust_begin_short_backtrace<fn() -> core::result::Result<(), hyperqueue::common::error::HqError>, core::result::Result<(), hyperqueue::common::error::HqError>>
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/sys_common/backtrace.rs:135:18
93: 0x55fb60c1e6d3 - main
94: 0x1552c2a2ad85 - __libc_start_main
95: 0x55fb60b4a059 - <unknown>
96: 0x0 - <unknown>
Oops, HyperQueue has crashed. This is a bug, sorry for that.
If you would be so kind, please report this issue at the HQ issue tracker: https://github.com/It4innovations/hyperqueue/issues/new?title=HQ%20crashes
Please include the above error (starting from "thread ... panicked ...") and the stack backtrace in the issue contents, along with the following information:
HyperQueue version: v0.17.0
You can also re-run HyperQueue server (and its workers) with the `RUST_LOG=hq=debug,tako=debug`
environment variable, and attach the logs to the issue, to provide us more information.
Aborted (core dumped)
It seems correlated with one of allocation killed due to walltime
This was caused by a leftover part in the HQ scheduler that was causing issues, it is no longer needed anyway, so we have removed it (https://github.com/It4innovations/hyperqueue/pull/676). Next nightly release of HQ should contain the fix.
@Kobzol thanks for a very quick reaction. Can I test the fixed hyperqueue before next release ? I.e. using nightly build ?