apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine
https://datafusion.apache.org/ballista
Apache License 2.0
1.39k stars 181 forks source link

Suspicious slow test in Ballista #235

Open yjshen opened 2 years ago

yjshen commented 2 years ago

Describe the bug

SLOW [> 60.005s]        ballista-executor cpu_bound_executor::tests::executor_shutdown_while_task_running
SLOW [>120.011s]         ballista-executor cpu_bound_executor::tests::executor_shutdown_while_task_running
SLOW [>180.015s]         ballista-executor cpu_bound_executor::tests::executor_shutdown_while_task_running
SLOW [>240.016s]         ballista-executor cpu_bound_executor::tests::executor_shutdown_while_task_running

To Reproduce

On the master branch, run cargo nextest run, not always present, on average occurs 2-3 times out of 5 runs.

yjshen commented 2 years ago

@Ted-Jiang any ideas on this?

Ted-Jiang commented 2 years ago

@yjshen I run it on my local mac, but can not reproduce it. I think cargo test run in parallel it may not slow down the whole process.

Ted-Jiang commented 2 years ago

I will try reproduce it.

yjshen commented 2 years ago

https://github.com/apache/arrow-datafusion/runs/5838876700?check_suite_focus=true

test cpu_bound_executor::tests::executor_shutdown_while_task_running has been running for over 60 seconds

Our CI encountered this as well.

Ted-Jiang commented 2 years ago

If it break the Ci pipeline, i think we should ignore it first.

mingmwang commented 2 years ago

I had encountered the same issue in my local laptop.

mingmwang commented 2 years ago
 2658 Thread_8907905: cpu_bound_executor::tests::executor_shutdown_while_task_running
      2658 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff6f346b8b]
        2658 _pthread_start  (in libsystem_pthread.dylib) + 148  [0x7fff6f34b109]
          2658 std::sys::unix::thread::Thread::new::thread_start::hcccd55275f17da3c  (in ballista_executor-68e1b504e701a7b1) + 39  [0x1015118b7]
            2658 core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::heaae50de5145d63f  (in ballista_executor-68e1b504e701a7b1) + 108  [0x1013cb81c]
              2658 std::sys_common::backtrace::__rust_begin_short_backtrace::hf924fdb191c98ca2  (in ballista_executor-68e1b504e701a7b1) + 223  [0x1013fe4bf]
                2658 test::run_test::run_test_inner::_$u7b$$u7b$closure$u7d$$u7d$::hb1e54bbefa2cb224  (in ballista_executor-68e1b504e701a7b1) + 537  [0x1013f51f9]
                  2658 test::__rust_begin_short_backtrace::hc31ef4181a4ed344  (in ballista_executor-68e1b504e701a7b1) + 6  [0x1013f63a6]
                    2658 core::ops::function::FnOnce::call_once::h4a3919413b343338  (in ballista_executor-68e1b504e701a7b1) + 17  [0x1013be8f1]  function.rs:227
                      2658 ballista_executor::cpu_bound_executor::tests::executor_shutdown_while_task_running::_$u7b$$u7b$closure$u7d$$u7d$::hee09d7a6d9ce7755  (in ballista_executor-68e1b504e701a7b1) + 17  [0x101394e81]  cpu_bound_executor.rs:314
                        2658 ballista_executor::cpu_bound_executor::tests::executor_shutdown_while_task_running::h0849df972dbf35c5  (in ballista_executor-68e1b504e701a7b1) + 269  [0x101394f9d]  cpu_bound_executor.rs:325
                          2658 tokio::runtime::Runtime::block_on::h2a50505bf1733efc  (in ballista_executor-68e1b504e701a7b1) + 183  [0x101398597]  mod.rs:475
                            2658 tokio::runtime::basic_scheduler::BasicScheduler::block_on::h9a25d43ce7b9b48f  (in ballista_executor-68e1b504e701a7b1) + 342  [0x101371f26]  basic_scheduler.rs:182
                              2658 tokio::runtime::basic_scheduler::CoreGuard::block_on::hb29ed79897693f54  (in ballista_executor-68e1b504e701a7b1) + 60  [0x10137828c]  basic_scheduler.rs:506
                                2658 tokio::runtime::basic_scheduler::CoreGuard::enter::ha493c5f73d49585a  (in ballista_executor-68e1b504e701a7b1) + 310  [0x10137f1f6]  basic_scheduler.rs:582
                                  2658 tokio::macros::scoped_tls::ScopedKey$LT$T$GT$::set::h3dd0649e7cc4ba14  (in ballista_executor-68e1b504e701a7b1) + 124  [0x10138845c]  scoped_tls.rs:61
                                    2658 tokio::runtime::basic_scheduler::CoreGuard::enter::_$u7b$$u7b$closure$u7d$$u7d$::h06d5fd86ff04e726  (in ballista_executor-68e1b504e701a7b1) + 43  [0x10137f68b]  basic_scheduler.rs:582
                                      2658 tokio::runtime::basic_scheduler::CoreGuard::block_on::_$u7b$$u7b$closure$u7d$$u7d$::hb8fbb5347d8ca946  (in ballista_executor-68e1b504e701a7b1) + 497  [0x10137af11]  basic_scheduler.rs:515
                                        2658 tokio::runtime::basic_scheduler::Context::enter::he70a0177f690a79f  (in ballista_executor-68e1b504e701a7b1) + 324  [0x1013772c4]  basic_scheduler.rs:374
                                          2658 tokio::runtime::basic_scheduler::CoreGuard::block_on::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::hdfc97956fef2e9ae  (in ballista_executor-68e1b504e701a7b1) + 153  [0x10137cd59]  basic_scheduler.rs:516
                                            2658 std::thread::local::LocalKey$LT$T$GT$::with::ha7b9fb0649bc6957  (in ballista_executor-68e1b504e701a7b1) + 44  [0x1013806ec]  local.rs:388
                                              2658 std::thread::local::LocalKey$LT$T$GT$::try_with::h9b0b04783a8fede9  (in ballista_executor-68e1b504e701a7b1) + 192  [0x101383160]  local.rs:412
                                                2658 tokio::coop::with_budget::_$u7b$$u7b$closure$u7d$$u7d$::hfbe9ccefa81dfe59  (in ballista_executor-68e1b504e701a7b1) + 169  [0x10139e619]  coop.rs:102
                                                  2658 tokio::runtime::basic_scheduler::CoreGuard::block_on::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h9d6ff700476c3a35  (in ballista_executor-68e1b504e701a7b1) + 58  [0x10137d0ea]  basic_scheduler.rs:516
                                                    2658 _$LT$core..pin..Pin$LT$P$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h63366d8764637773  (in ballista_executor-68e1b504e701a7b1) + 66  [0x101387e82]  future.rs:123
                                                      2658 _$LT$core..future..from_generator..GenFuture$LT$T$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h7790bc3be2d2f31a  (in ballista_executor-68e1b504e701a7b1) + 80  [0x1013c3480]  mod.rs:84
                                                        2658 ballista_executor::cpu_bound_executor::tests::executor_shutdown_while_task_running::_$u7b$$u7b$closure$u7d$$u7d$::hbac425d8b50a7483  (in ballista_executor-68e1b504e701a7b1) + 633  [0x101395279]  cpu_bound_executor.rs:322
                                                          2658 std::sync::barrier::Barrier::wait::hd4f5e692c300c825  (in ballista_executor-68e1b504e701a7b1) + 138  [0x101504e7a]
                                                            2658 _pthread_cond_wait  (in libsystem_pthread.dylib) + 698  [0x7fff6f34b425]
                                                              2658 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fff6f28a882]
thinkharderdev commented 2 years ago

I've seen the same thing locally but it seems to be sporadic so I haven't been able to investigate.

mingmwang commented 1 year ago

Can we upgrade the tokio version in Ballista ?

Ted-Jiang commented 1 year ago

Can we upgrade the tokio version in Ballista ?

If you don't mind, i will create a pr to update it, should we update it to the latest?