apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine
https://datafusion.apache.org/ballista
Apache License 2.0
1.57k stars 198 forks source link

Fix panics at 'index out of bounds: the len is 0 but the index is 0': ballista/scheduler/src/planner.rs #818

Closed yahoNanJing closed 1 year ago

yahoNanJing commented 1 year ago

Describe the bug

When some executor is lost, the partition_locations of some stage may be changed due to reset_stages_internal, which may cause let stage_id = partition_locations[0][0].partition_id.stage_id; panic due to index out of bounds.

To Reproduce

Expected behavior

Additional context

The details of the trace is as follows: 2023-06-14T16:21:50.690289Z ERROR tokio-runtime-worker ThreadId(04) panic: thread 'tokio-runtime-worker' panicked at 'index out of bounds: the len is 0 but the index is 0': ballista/scheduler/src/planner.rs:271 0: <backtrace::capture::Backtrace as core::default::Default>::default 1: log_panics::Config::install_panic_hook::{{closure}} 2: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/alloc/src/boxed.rs:1987:9 std::panicking::rust_panic_with_hook at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:695:13 3: std::panicking::begin_panic_handler::{{closure}} at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:582:13 4: std::sys_common::backtrace::__rust_end_short_backtrace at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:150:18 5: rust_begin_unwind at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5 6: core::panicking::panic_fmt at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14 7: core::panicking::panic_bounds_check at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:162:5 8: ballista_scheduler::planner::rollback_resolved_shuffles 9: ballista_scheduler::planner::rollback_resolved_shuffles 10: ballista_scheduler::planner::rollback_resolved_shuffles 11: ballista_scheduler::planner::rollback_resolved_shuffles 12: ballista_scheduler::planner::rollback_resolved_shuffles 13: ballista_scheduler::planner::rollback_resolved_shuffles 14: ballista_scheduler::planner::rollback_resolved_shuffles 15: ballista_scheduler::state::execution_graph::execution_stage::RunningStage::to_unresolved 16: ballista_scheduler::state::execution_graph::ExecutionGraph::rollback_running_stage 17: ballista_scheduler::state::execution_graph::ExecutionGraph::processing_stages_update 18: ballista_scheduler::state::execution_graph::ExecutionGraph::update_task_status 19: <ballista_scheduler::scheduler_server::query_stage_scheduler::QueryStageScheduler<T,U> as ballista_core::event_loop::EventAction<ballista_scheduler::scheduler_server::event::QueryStageSchedulerEvent>>::on_receive::{{closure}} 20: ballista_core::event_loop::EventLoop<E>::run::{{closure}} 21: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut 22: tokio::runtime::task::core::Core<T,S>::poll 23: tokio::runtime::task::harness::Harness<T,S>::poll 24: tokio::runtime::scheduler::multi_thread::worker::Context::run_task 25: tokio::runtime::scheduler::multi_thread::worker::Context::run 26: tokio::macros::scoped_tls::ScopedKey<T>::set 27: tokio::runtime::scheduler::multi_thread::worker::run 28: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut 29: tokio::runtime::task::core::Core<T,S>::poll 30: tokio::runtime::task::harness::Harness<T,S>::poll 31: tokio::runtime::blocking::pool::Inner::run 32: std::sys_common::backtrace::__rust_begin_short_backtrace 33: core::ops::function::FnOnce::call_once{{vtable.shim}} 34: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/alloc/src/boxed.rs:1973:9 <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/alloc/src/boxed.rs:1973:9 std::sys::unix::thread::Thread::new::thread_start at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys/unix/thread.rs:108:17 35: start_thread at /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:477:8 36: clone at /build/glibc-SzIz7B/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95