Open avrecko opened 5 months ago
Let's wait with this a bit. Need to investigate some more as seeing weird delays in detecting the exit status on the virtual machine but not real. Hm...
I've looked why sometimes echo foo took a few ms to complete and sometimes the full poll interval.
If I taskset the test program to the same CPU it always takes full poll interval. Here is perf sched timehist
6257.725092 [0003] echo[24612] awakened: java[24263/24262]
6257.725101 [0003] s echo[24612] 0.011 0.000 0.072 __cond_resched <- dput <- __fput <- task_work_run <- exit_to_user_mode_loop
6257.725156 [0003] java[24263/24262] awakened: kworker/u8:3-ev[21178]
6257.725287 [0003] s java[24263/24262] 0.072 0.008 0.186 futex_wait_queue <- futex_wait <- do_futex <- __x64_sys_futex
6257.727160 [0003] echo[24612] awakened: ProcessQueue0[24280/24262]
6257.727175 [0003] s echo[24612] 0.186 0.000 1.888 exit_to_user_mode_loop <- exit_to_user_mode_prepare <- syscall_exit_to_user_mode <- do_syscall_64
6257.727253 [0003] ProcessQueue0[24280/24262] awakened: kworker/u8:3-ev[21178]
6257.727497 [0003] s ProcessQueue0[24280/24262] 3.275 0.014 0.322 schedule_hrtimeout_range_clock <- ep_poll <- do_epoll_wait <- __x64_sys_epoll_wait
6257.727589 [0003] echo[24612] awakened: ProcessQueue0[24280/24262]
6257.727627 [0003] s echo[24612] 0.322 0.000 0.130 __cond_resched <- dput <- __fput <- task_work_run <- exit_to_user_mode_loop
6257.727686 [0003] ProcessQueue0[24280/24262] awakened: kworker/u8:3-ev[21178]
6257.727847 [0003] ProcessQueue0[24280/24262] awakened: kworker/u8:3-ev[21178]
6257.728105 [0003] ProcessQueue0[24280/24262] awakened: kworker/u8:3-ev[21178]
6257.728147 [0003] s ProcessQueue0[24280/24262] 0.130 0.038 0.519 schedule_hrtimeout_range_clock <- ep_poll <- do_epoll_wait <- __x64_sys_epoll_wait
6257.728156 [0003] echo[24612] awakened: ProcessQueue0[24280/24262]
6257.728172 [0003] s echo[24612] 0.519 0.000 0.024 __cond_resched <- dput <- __fput <- task_work_run <- exit_to_user_mode_loop
6257.728223 [0003] ProcessQueue0[24280/24262] awakened: kworker/u8:3-ev[21178]
6257.728317 [0003] ProcessQueue0[24280/24262] awakened: kworker/u8:3-ev[21178]
6257.728382 [0003] s ProcessQueue0[24280/24262] 0.024 0.016 0.210 schedule_hrtimeout_range_clock <- ep_poll <- do_epoll_wait <- __x64_sys_epoll_wait
6257.728519 [0003] s echo[24612] 0.210 0.000 0.136 do_task_dead <- do_exit <- do_group_exit <- __x64_sys_exit_group <- do_syscall_64
.
.
.
the 250 ms+ before ProcessQueue0 shows up again and detected the exit and starts echo foo again
6257.978137 [0003] ProcessQueue0[24280/24262] awakened: kworker/u8:3-ev[21178]
6257.978330 [0003] ProcessQueue0[24280/24262] awakened: kworker/u8:3-ev[21178]
6257.978451 [0003] ProcessQueue0[24280/24262] awakened: migration/3[36]
6257.978524 [0003] ProcessQueue0[24280/24262] awakened: rcu_preempt[17]
6257.978547 [0003] s ProcessQueue0[24280/24262] 249.670 0.050 0.493 exit_to_user_mode_loop <- exit_to_user_mode_prepare <- syscall_exit_to_user_mode <- do_syscall_64
6257.978563 [0003] s migration/3[36] 0.000 0.095 0.016 smpboot_thread_fn <- kthread <- ret_from_fork
6257.978570 [0003] s rcu_preempt[17] 171.376 0.039 0.007 schedule_timeout <- rcu_gp_fqs_loop <- rcu_gp_kthread <- kthread
6257.978618 [0003] ProcessQueue0[24280/24262] awakened: java[24263/24262]
6257.978654 [0003] s ProcessQueue0[24280/24262] 0.023 0.000 0.083 schedule_hrtimeout_range_clock <- ep_poll <- do_epoll_wait <- __x64_sys_epoll_wait
6257.978691 [0003] java[24263/24262] awakened: C2 CompilerThre[24271/24262]
6257.978699 [0003] java[24263/24262] awakened: C1 CompilerThre[24272/24262]
6257.978775 [0003] java[24263/24262] awakened: kworker/u8:3-ev[21178]
6257.979224 [0003] java[24263/24262] awakened: java[24613]
6257.979233 [0003] s java[24263/24262] 253.366 0.035 0.579 schedule_timeout <- __wait_for_common <- wait_for_completion_killable <- kernel_clone
6257.979250 [0003] s C2 CompilerThre[24271/24262] 254.722 0.541 0.017 futex_wait_queue <- futex_wait <- do_futex <- __x64_sys_futex
6257.979619 [0003] s C1 CompilerThre[24272/24262] 254.455 0.551 0.368 futex_wait_queue <- futex_wait <- do_futex <- __x64_sys_futex
6257.979920 [0003] java[24613] awakened: java[24263/24262]
6257.979935 [0003] s java[24613] 0.000 0.394 0.316 __cond_resched <- down_write_killable <- exec_mmap <- begin_new_exec <- load_elf_binary
6257.979950 [0003] s java[24263/24262] 0.702 0.014 0.015 pipe_read <- new_sync_read <- vfs_read <- ksys_read
6257.980061 [0003] echo[24613] awakened: java[24263/24262]
6257.980072 [0003] s echo[24613] 0.015 0.000 0.121 __cond_resched <- dput <- __fput <- task_work_run <- exit_to_user_mode_loop
the problem is this
6257.728172 [0003] s echo[24612] 0.519 0.000 0.024 __cond_resched <- dput <- __fput <- task_work_run <- exit_to_user_mode_loop
6257.728223 [0003] ProcessQueue0[24280/24262] awakened: kworker/u8:3-ev[21178]
6257.728317 [0003] ProcessQueue0[24280/24262] awakened: kworker/u8:3-ev[21178]
6257.728382 [0003] s ProcessQueue0[24280/24262] 0.024 0.016 0.210 schedule_hrtimeout_range_clock <- ep_poll <- do_epoll_wait <- __x64_sys_epoll_wait
6257.728519 [0003] s echo[24612] 0.210 0.000 0.136 do_task_dead <- do_exit <- do_group_exit <- __x64_sys_exit_group <- do_syscall_64
it looks like the following is happening:
echo foo closed the streams but gets preempted before exits, epoll wakes up sees the closed streams checks for exit status, sees echo still alive, does the wait on 250ms gets preempted echo foo is scheduled, it finally exits .... wait 250ms epoll wakes up sees echo as exited.
aside: this is why I want to have a different name for the threads, ProcessQueue0 is very generic in the report. NuProcessPoller would be more obvious.
Did a test: Using NuProcess in a loop to do 1000 calls to echo foo, using 1 executor, pinned to 1 cpu core. The case I looked with perf sched.
without any code changes the non-blocking LibC.waitpid(linuxProcess.getPid(), ret, LibC.WNOHANG)
with 250ms poll time took 26 seconds for 100 calls to complete, expecting 260 seconds to complete 1000 calls. Didn't wait.
blocking Java Standard Library ProcessBuilder took ~3.6 seconds to complete 1000 calls.
changing the code for #cleanupProcess
to do a blocking wait LibC.waitpid(linuxProcess.getPid(), ret, 0)
took ~2.8 seconds to complete 1000 calls. This is even better than Java ProcessBuilder.
non-blocking with dynamic poll time took around 5.5 seconds to complete. While not as fast as the blocking code, but not too bad.
Noticed on Linux that sometimes it takes the full
DEADPOOL_POLL_INTERVAL
for the exit to be detected.I think it might make sense to check for exit a lot sooner than
DEADPOOL_POLL_INTERVAL
when detecting a close of a stream.