brettwooldridge / NuProcess

Low-overhead, non-blocking I/O, external Process implementation for Java
Apache License 2.0
710 stars 84 forks source link

Dynamic deadpool poll time for faster exit detection on linux #148

Open avrecko opened 5 months ago

avrecko commented 5 months ago

Noticed on Linux that sometimes it takes the full DEADPOOL_POLL_INTERVAL for the exit to be detected.

I think it might make sense to check for exit a lot sooner than DEADPOOL_POLL_INTERVAL when detecting a close of a stream.

avrecko commented 5 months ago

Let's wait with this a bit. Need to investigate some more as seeing weird delays in detecting the exit status on the virtual machine but not real. Hm...

avrecko commented 5 months ago

I've looked why sometimes echo foo took a few ms to complete and sometimes the full poll interval.

If I taskset the test program to the same CPU it always takes full poll interval. Here is perf sched timehist

    6257.725092 [0003]         echo[24612]                                                      awakened: java[24263/24262]
    6257.725101 [0003]     s   echo[24612]                         0.011      0.000      0.072                                    __cond_resched <- dput <- __fput <- task_work_run <- exit_to_user_mode_loop
    6257.725156 [0003]         java[24263/24262]                                                awakened: kworker/u8:3-ev[21178]
    6257.725287 [0003]     s   java[24263/24262]                   0.072      0.008      0.186                                    futex_wait_queue <- futex_wait <- do_futex <- __x64_sys_futex
    6257.727160 [0003]         echo[24612]                                                      awakened: ProcessQueue0[24280/24262]
    6257.727175 [0003]     s   echo[24612]                         0.186      0.000      1.888                                    exit_to_user_mode_loop <- exit_to_user_mode_prepare <- syscall_exit_to_user_mode <- do_syscall_64
    6257.727253 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.727497 [0003]     s   ProcessQueue0[24280/24262]          3.275      0.014      0.322                                    schedule_hrtimeout_range_clock <- ep_poll <- do_epoll_wait <- __x64_sys_epoll_wait
    6257.727589 [0003]         echo[24612]                                                      awakened: ProcessQueue0[24280/24262]
    6257.727627 [0003]     s   echo[24612]                         0.322      0.000      0.130                                    __cond_resched <- dput <- __fput <- task_work_run <- exit_to_user_mode_loop
    6257.727686 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.727847 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.728105 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.728147 [0003]     s   ProcessQueue0[24280/24262]          0.130      0.038      0.519                                    schedule_hrtimeout_range_clock <- ep_poll <- do_epoll_wait <- __x64_sys_epoll_wait
    6257.728156 [0003]         echo[24612]                                                      awakened: ProcessQueue0[24280/24262]
    6257.728172 [0003]     s   echo[24612]                         0.519      0.000      0.024                                    __cond_resched <- dput <- __fput <- task_work_run <- exit_to_user_mode_loop
    6257.728223 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.728317 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.728382 [0003]     s   ProcessQueue0[24280/24262]          0.024      0.016      0.210                                    schedule_hrtimeout_range_clock <- ep_poll <- do_epoll_wait <- __x64_sys_epoll_wait
    6257.728519 [0003]     s   echo[24612]                         0.210      0.000      0.136                                    do_task_dead <- do_exit <- do_group_exit <- __x64_sys_exit_group <- do_syscall_64
    .
    .
    .
    the 250 ms+ before ProcessQueue0 shows up again and detected the exit and starts echo foo again
    6257.978137 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.978330 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.978451 [0003]         ProcessQueue0[24280/24262]                                       awakened: migration/3[36]
    6257.978524 [0003]         ProcessQueue0[24280/24262]                                       awakened: rcu_preempt[17]
    6257.978547 [0003]     s   ProcessQueue0[24280/24262]        249.670      0.050      0.493                                    exit_to_user_mode_loop <- exit_to_user_mode_prepare <- syscall_exit_to_user_mode <- do_syscall_64
    6257.978563 [0003]     s   migration/3[36]                     0.000      0.095      0.016                                    smpboot_thread_fn <- kthread <- ret_from_fork
    6257.978570 [0003]     s   rcu_preempt[17]                   171.376      0.039      0.007                                    schedule_timeout <- rcu_gp_fqs_loop <- rcu_gp_kthread <- kthread
    6257.978618 [0003]         ProcessQueue0[24280/24262]                                       awakened: java[24263/24262]
    6257.978654 [0003]     s   ProcessQueue0[24280/24262]          0.023      0.000      0.083                                    schedule_hrtimeout_range_clock <- ep_poll <- do_epoll_wait <- __x64_sys_epoll_wait
    6257.978691 [0003]         java[24263/24262]                                                awakened: C2 CompilerThre[24271/24262]
    6257.978699 [0003]         java[24263/24262]                                                awakened: C1 CompilerThre[24272/24262]
    6257.978775 [0003]         java[24263/24262]                                                awakened: kworker/u8:3-ev[21178]
    6257.979224 [0003]         java[24263/24262]                                                awakened: java[24613]
    6257.979233 [0003]     s   java[24263/24262]                 253.366      0.035      0.579                                    schedule_timeout <- __wait_for_common <- wait_for_completion_killable <- kernel_clone
    6257.979250 [0003]     s   C2 CompilerThre[24271/24262]      254.722      0.541      0.017                                    futex_wait_queue <- futex_wait <- do_futex <- __x64_sys_futex
    6257.979619 [0003]     s   C1 CompilerThre[24272/24262]      254.455      0.551      0.368                                    futex_wait_queue <- futex_wait <- do_futex <- __x64_sys_futex
    6257.979920 [0003]         java[24613]                                                      awakened: java[24263/24262]
    6257.979935 [0003]     s   java[24613]                         0.000      0.394      0.316                                    __cond_resched <- down_write_killable <- exec_mmap <- begin_new_exec <- load_elf_binary
    6257.979950 [0003]     s   java[24263/24262]                   0.702      0.014      0.015                                    pipe_read <- new_sync_read <- vfs_read <- ksys_read
    6257.980061 [0003]         echo[24613]                                                      awakened: java[24263/24262]
    6257.980072 [0003]     s   echo[24613]                         0.015      0.000      0.121                                    __cond_resched <- dput <- __fput <- task_work_run <- exit_to_user_mode_loop    

the problem is this

  6257.728172 [0003]     s   echo[24612]                         0.519      0.000      0.024                                    __cond_resched <- dput <- __fput <- task_work_run <- exit_to_user_mode_loop
    6257.728223 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.728317 [0003]         ProcessQueue0[24280/24262]                                       awakened: kworker/u8:3-ev[21178]
    6257.728382 [0003]     s   ProcessQueue0[24280/24262]          0.024      0.016      0.210                                    schedule_hrtimeout_range_clock <- ep_poll <- do_epoll_wait <- __x64_sys_epoll_wait
    6257.728519 [0003]     s   echo[24612]                         0.210      0.000      0.136                                    do_task_dead <- do_exit <- do_group_exit <- __x64_sys_exit_group <- do_syscall_64

it looks like the following is happening:

echo foo closed the streams but gets preempted before exits, epoll wakes up sees the closed streams checks for exit status, sees echo still alive, does the wait on 250ms gets preempted echo foo is scheduled, it finally exits .... wait 250ms epoll wakes up sees echo as exited.

aside: this is why I want to have a different name for the threads, ProcessQueue0 is very generic in the report. NuProcessPoller would be more obvious.

avrecko commented 5 months ago

Did a test: Using NuProcess in a loop to do 1000 calls to echo foo, using 1 executor, pinned to 1 cpu core. The case I looked with perf sched.

without any code changes the non-blocking LibC.waitpid(linuxProcess.getPid(), ret, LibC.WNOHANG) with 250ms poll time took 26 seconds for 100 calls to complete, expecting 260 seconds to complete 1000 calls. Didn't wait.

blocking Java Standard Library ProcessBuilder took ~3.6 seconds to complete 1000 calls.

changing the code for #cleanupProcess to do a blocking wait LibC.waitpid(linuxProcess.getPid(), ret, 0) took ~2.8 seconds to complete 1000 calls. This is even better than Java ProcessBuilder.

non-blocking with dynamic poll time took around 5.5 seconds to complete. While not as fast as the blocking code, but not too bad.