cloudius-systems / osv

OSv, a new operating system for the cloud.
osv.io
Other
4.1k stars 603 forks source link

epoll_wait() is oblivious to concurrent updates via epoll_ctl() #331

Closed tgrabiec closed 10 years ago

tgrabiec commented 10 years ago

This issue can be seen when running an application which is using apache-sshd version 0.11.0. The issue does not occur with apache-sshd version 0.6.0. Here is an exploiting application: https://github.com/tgrabiec/apache-sshd-example/tree/master. To reproduce run it and try to connect to OSv via ssh on port 22. The client hangs with no effect on the guest. On the version which is not affected by this issue the server inside OSv should print the "hello!" message.

The affected version of apache-sshd is using NIO2's async sockets, which are using epoll under the hood. Tracing net_packet* shows that TCP connection is established fine and data packets flow from the client, however there is no response from the server. Tracing of epoll* shows that epoll interface is used in a way we don't support, which I believe may be the cause. One thread is calling epoll_wait() and blocks there. After that another thread is adding a new file descriptor to the epoll file. Because epoll_wait() will not take this change into consideration, it will not be woken up on that file descriptor's events.

0xffffc00035d15010                  1 1401717235.172754288 epoll_create         returned fd=15
0xffffc00035d15010                  1 1401717235.172831297 epoll_ctl            epfd=15, fd=16, op=EPOLL_CTL_ADD

0xffffc0003356c010                  1 1401717235.174446106 epoll_wait           epfd=15, maxevents=512, timeout=-1

0xffffc00035d15010                  1 1401717235.185563803 tcp_state            tp=0xffffc0003905f400, 0 -> 0
0xffffc00035d15010                  1 1401717235.188999891 tcp_state            tp=0xffffc0003905f400, 0 -> 1
0xffffc00035d15010                  1 1401717235.191050529 epoll_ctl            epfd=15, fd=18, op=EPOLL_CTL_MOD
0xffffc00035d15010                  1 1401717235.191078424 epoll_ctl            epfd=15, fd=18, op=EPOLL_CTL_ADD
lgoldstein commented 10 years ago

Tested it - causes VM crash on Vmware Workstation - see capture below: capture

tgrabiec commented 10 years ago

It's a different problem. Looks like we do not support EPOLLONESHOT yet, which is requested by Java. I will open a new ticket for this.

Assertion failed: !(e & ~SUPPORTED_EVENTS) (/home/tgrabiec/src/osv/core/epoll.cc: events_epoll_to_poll: 51)

#4  0x0000000000223eb9 in __assert_fail (expr=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at /home/tgrabiec/src/osv/runtime.cc:145
#5  0x00000000003d551c in events_epoll_to_poll (e=0x40000001) at /home/tgrabiec/src/osv/core/epoll.cc:51
#6  wait (timeout_ms=<optimized out>, maxevents=0x200, events=0xffff8000084a1040, this=0xffffa00001506500) at /home/tgrabiec/src/osv/core/epoll.cc:183
#7  epoll_wait (epfd=<optimized out>, events=0xffff8000084a1040, maxevents=0x200, timeout_ms=<optimized out>) at /home/tgrabiec/src/osv/core/epoll.cc:321
#8  0x000010000440986d in Java_sun_nio_ch_EPoll_epollWait ()

gdb$ select-frame 5
gdb$ p e
$1 = 0x40000001
tgrabiec commented 10 years ago

See #413