cloudius-systems / osv-apps

OSv Applications
135 stars 75 forks source link

iperf doesn't exit once done #56

Open justinc1 opened 7 years ago

justinc1 commented 7 years ago

I noticed that orig iperf 2.0.5 does not exit VM once done. Some threads are left over, and some extra work is required to get them terminated.

OSv - master at

commit 4a716433ceca860b9b94b24384b96cd45ea32b01
Author: Nadav Har'El <nyh@scylladb.com>
Date:   Mon Aug 21 14:48:56 2017 +0300

Host:

justin_cinkelj@jcpc:~/devel/mikelangelo/osv-t1/osv$ iperf -s

OSv:

justin_cinkelj@jcpc:~/devel/mikelangelo/osv-t1/osv$ sudo ./scripts/run.py -nv -d -e '/tools/iperf -c 192.168.122.1 -t0.1 '
OSv v0.24-432-g4a71643
eth0: 192.168.122.90
------------------------------------------------------------
Client connecting to 192.168.122.1, TCP port 5001
TCP window size: 32.5 KByte (default)
------------------------------------------------------------
[  3] local 192.168.122.90 port 19756 connected with 192.168.122.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 0.1 sec   151 MBytes  12.4 Gbits/sec

gdb:

 244 (0xffff8000037dd040) dhcp            cpu1 status::waiting dhcp::dhcp_worker::dhcp_worker_fn() at core/dhcp.cc:830 vruntime  8.70543e-22
 247 (0xffff800003a3c040) >/tools/iperf   cpu0 status::waiting waiter::wait(sched::timer*) const at include/osv/wait_record.hh:47 vruntime  6.83171e-21
Number of threads: 139

(gdb) osv thr 247
(gdb) bt
#0  sched::thread::switch_to (this=0xffff8000037e3040) at arch/x64/arch-switch.hh:75
#1  0x00000000005b6f10 in sched::cpu::reschedule_from_interrupt (this=0xffff8000017de040, called_from_yield=false, preempt_after=...) at core/sched.cc:339
#2  0x00000000005b67cc in sched::cpu::schedule () at core/sched.cc:228
#3  0x00000000005ba547 in sched::thread::wait (this=0xffff800003a3c040) at core/sched.cc:1214
#4  0x00000000005649d4 in sched::thread::do_wait_until<sched::noninterruptible, sched::thread::dummy_lock, waiter::wait(sched::timer*) const::{lambda()#1}>(sched::thread::dummy_lock&, waiter::wait(sched::timer*) const::{lambda()#1}) (mtx=..., pred=...) at include/osv/sched.hh:1063
#5  0x000000000056475a in sched::thread::wait_until<waiter::wait(sched::timer*) const::{lambda()#1}>(waiter::wait(sched::timer*) const::{lambda()#1}) (pred=...) at include/osv/sched.hh:1074
#6  0x0000000000564722 in waiter::wait (this=0x2000003ffe90, tmr=0x0) at include/osv/wait_record.hh:47
#7  0x0000000000563751 in condvar::wait (this=0xffffa000038d9700, user_mutex=0xffffa00002ee0440, tmr=0x0) at core/condvar.cc:43
#8  0x000000000069fc75 in pthread_cond_wait (cond=0x100000e19840 <ReportCond>, mutex=0x100000e19870 <ReportCond+48>) at libc/pthread.cc:593
#9  0x0000100000c0d465 in reporter_spawn ()
#10 0x0000100000c132c9 in thread_run_wrapper ()
#11 0x000000000069ebf7 in pthread_private::pthread::<lambda()>::operator()(void) const (__closure=0xffffa00002d8b800) at libc/pthread.cc:114
#12 0x00000000006a1704 in std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/6.3.1/functional:1731
#13 0x0000000000443fbc in std::function<void ()>::operator()() const (this=0xffff800003a3c070) at /usr/include/c++/6.3.1/functional:2127
#14 0x00000000005ba528 in sched::thread::main (this=0xffff800003a3c040) at core/sched.cc:1208
#15 0x00000000005b63fc in sched::thread_main_c (t=0xffff800003a3c040) at arch/x64/arch-switch.hh:170
#16 0x0000000000481553 in thread_main () at arch/x64/entry.S:113

So after main() returns, one app thread is left, and VM doesn't terminate - app is still considered to be up. Correct?

justinc1 commented 7 years ago

i tried to fix this, on iperf 2.0.10 from http://sourceforge.net/projects/iperf2. There are two problematic threads: reporter_spawn for iperf client and server listener_spawn for iperf server only

Approach: set flag that main_will_exit. Then, reporter_spawn will return once flag is set. main can exit afterwards. The main should also join on reporter_spawn, otherwise we might have problems (main exited sortly before reporter_spawn, and reporter_spawn trying to acccess data destroyed by main).

Same for listener_spawn, except that that one is waiting in accept. To get it woken up, main does one additional connect. Ugly, but seems to work. There is mentioned that shutdown(sock, how) should wake up thread in accept(), but this doesn't seem to work on OSv.

I can try to finalize the patch, if it is worth - I didn't yet try to remove pthread_detach, assemble list of all started threads, and at the end let main() join on all of them.

But maybe, even current patch might be better than VM which doesn't exit once work is done? main waits on the mentioned threads to set 'worker_exited' flag, and than comes silly say 0.1 sec delay, so that other threads have additional time to exit.

justinc1 commented 7 years ago

The patches I was talking about. I pushed code to https://github.com/justinc1/iperf2/tree/osv-2.0.10 , so I will not delete it by accident.

nyh commented 7 years ago

A couple of comments on what you wrote above:

It is fine that the main thread returns, it doesn't really need to wait for the other threads, because the program is not unloaded (and its memory not freed, static objects not destructed, etc.) and unless main() deliberately frees something before returning (maybe it does, I'm really not familiar with that program) it could be fine.

I don't understand how shutdown() works on a socket in accept(), since shutdown() is supposed to work on a connected socket. But maybe this trick works on Linux? Would be interesting to check (and if it does work on Linux, we need to support it on OSv too...).

The classic way to interrupt accept is either through signals (ugly...) or, better, using a non-blocking accept and a select()/poll(): You select() the server socket and a second file descriptor (e.g., a pipe) which you use to notify it to wake up, and call accept() only when the server socket it ready.

But I wonder if it's a big problem if we just call exit(0) at the end, and have the entire OSv exit, always, when iperf finishes :-)

justinc1 commented 7 years ago

I guess problem with iperf and signals is that iperf expects signal handler to rain in the main thread (it does some configuration to ensure that). This is not true on OSv, and in some cases I saw quite a few signal_handler threads started. So ctrl+C does not stop VM.

I will use iperf more often in near future, and if the "just call exit()" works well for me, I will send a patch with it.