fukamachi / woo

A fast non-blocking HTTP server on top of libev
http://ultra.wikia.com/wiki/Woo_(kaiju)
MIT License
1.28k stars 98 forks source link

Memory fault on libev FFI #17

Closed takagi closed 9 years ago

takagi commented 9 years ago

Running a Woo server to benchmark with wrk tool causes memory fault on libev FFI. It happens to wrk more than once in my trials on OSX. I get similar condition on Ubuntu on Amazon EC2.

REPL:

> (ql:quickload :woo)
> (woo:run
    (lambda (env)
      (declare (ignore env))
      '(200 (:content-type "text/plain") ("Hello, World"))))

Shell:

$ wrk -c 10 -t 1 -d 10 http://localhost:5000
$ wrk -c 10 -t 1 -d 10 http://localhost:5000  # again

Caused memory fault:

Unhandled memory fault at #xFFFFFFFC002043C0.
   [Condition of type SB-SYS:MEMORY-FAULT-ERROR]

Restarts:
 0: [RETRY] Retry SLIME REPL evaluation request.
 1: [*ABORT] Return to SLIME's top level.
 2: [REMOVE-FD-HANDLER] Remove #<SB-IMPL::HANDLER INPUT on descriptor 9: #<CLOSURE (LABELS SWANK/SBCL::RUN :IN SWANK/BACKEND:ADD-FD-HANDLER) {100716E6EB}>>
 3: [ABORT] Exit debugger, returning to top level.

Backtrace:
  0: (SB-SYS:MEMORY-FAULT-ERROR)
  1: ("foreign function: call_into_lisp")
  2: ("foreign function: post_signal_tramp")
  3: ("foreign function: ev_io_stop")
  4: (LEV:EV-IO-STOP #.(SB-SYS:INT-SAP #X00204510) #.(SB-SYS:INT-SAP #X00204420))
  5: (WOO.EV.SOCKET:FREE-WATCHERS #S(WOO.EV.SOCKET:SOCKET :WATCHERS #(#.(SB-SYS:INT-SAP #X002043F0) #.(SB-SYS:INT-SAP #X00204420) #.(SB-SYS:INT-SAP #X00204450)) :LAST-ACTIVITY 1.43118499171181d9 :FD 23 :RE..
  6: (WOO.EV.SOCKET:CLOSE-SOCKET #S(WOO.EV.SOCKET:SOCKET :WATCHERS #(#.(SB-SYS:INT-SAP #X002043F0) #.(SB-SYS:INT-SAP #X00204420) #.(SB-SYS:INT-SAP #X00204450)) :LAST-ACTIVITY 1.43118499171181d9 :FD 23 :REM..
  7: (WOO.EV.TCP::TCP-READ-CB #<unavailable argument> #.(SB-SYS:INT-SAP #X002043F0) #<unavailable argument>)
  8: ((LAMBDA (SB-ALIEN::ARGS-POINTER SB-ALIEN::RESULT-POINTER FUNCTION) :IN "/Users/mtakagi/.cache/common-lisp/sbcl-1.2.9-macosx-x64/Users/mtakagi/Lisp/quicklisp/dists/quicklisp/software/woo-20150302-git/..
  9: ("foreign function: call_into_lisp")
 10: ("foreign function: funcall3")
 11: ("foreign function: #x20100C95")
 12: ("foreign function: #x204510")
 13: ("foreign function: ev_run")
 14: (LEV:EV-RUN #.(SB-SYS:INT-SAP #X00204510) 0)
 15: ((FLET WOO::START-SERVER :IN WOO:RUN))
 16: (WOO:RUN #<FUNCTION (LAMBDA (ENV)) {1009A9CEDB}> :DEBUG T :PORT 5000 :ADDRESS "0.0.0.0" :BACKLOG NIL :FD NIL :WORKER-NUM NIL)
 17: (SB-INT:SIMPLE-EVAL-IN-LEXENV (WOO:RUN (LAMBDA (ENV) (DECLARE #) (QUOTE #))) #<NULL-LEXENV>)
 18: (EVAL (WOO:RUN (LAMBDA (ENV) (DECLARE #) (QUOTE #))))

Environments

OSX:

$ uname -a
Darwin MacBookPro.local 13.3.0 Darwin Kernel Version 13.3.0: Tue Jun  3 21:27:35 PDT 2014; root:xnu-2422.110.17~1/RELEASE_X86_64 x86_64
$ sbcl --version
SBCL 1.2.9
$ brew list libev
/usr/local/Cellar/libev/4.15/include/ (2 files)
/usr/local/Cellar/libev/4.15/lib/libev.4.dylib
/usr/local/Cellar/libev/4.15/lib/ (2 other files)
/usr/local/Cellar/libev/4.15/share/man/man3/ev.3
$ sbcl --noinform --eval '(progn (ql:system-apropos "woo") (ql:system-apropos "lev") (quit))'
...
#<SYSTEM woo / woo-20150302-git / quicklisp 2015-05-05>
...
#<SYSTEM lev / lev-20150505-git / quicklisp 2015-05-05>
...

Amazon EC2:

$ uname -a
Linux ip-172-30-2-79 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$ aws ec2 describe-instances
...
                    "ImageId": "ami-d05e75b8",
...
$ sbcl --version
SBCL 1.1.14.debian
$ apt-show-versions -p libev-dev
libev-dev:amd64/trusty 1:4.15-3 uptodate
$ sbcl --noinform --eval '(progn (ql:system-apropos "woo") (ql:system-apropos "lev") (quit))'
...
#<SYSTEM woo / woo-20150302-git / quicklisp 2015-05-05>
...
#<SYSTEM lev / lev-20150505-git / quicklisp 2015-05-05>
...
fukamachi commented 9 years ago

It looks that it's trying to stop an I/O watcher which is already freed. I have no idea why and I couldn't reproduce it on my MacBook.

rudolph-miller commented 9 years ago

I tried and could not reproduce the case.

ฅ*•ω•*ฅ < ros run -- --version
SBCL 1.2.11
ฅ*•ω•*ฅ < brew list libev
/usr/local/Cellar/libev/4.15/include/ (2 files)
/usr/local/Cellar/libev/4.15/lib/libev.4.dylib
/usr/local/Cellar/libev/4.15/lib/ (2 other files)
/usr/local/Cellar/libev/4.15/share/man/man3/ev.3
ฅ*•ω•*ฅ < uname -a
Darwin Tomoy-Air.local 14.3.0 Darwin Kernel Version 14.3.0: Mon Mar 23 11:59:05 PDT 2015; root:xnu-2782.20.48~5/RELEASE_X86_64 x86_64
takagi commented 9 years ago

I provide a further report and a probable cause.

Problem The following steps may reproduce the issue. Detailed behavior may differ for environments.

1.Run a Woo server on AWS of AMI ami-d05e75b8.

> (ql:quickload :woo)
> (woo:run
    (lambda (env)
      (declare (ignore env))
      '(200 (:content-type "text/plain") ("Hello, World"))))

2.Send the following incomplete HTTP request, intended to close the connection before Woo starting its response.

> (ql:quickload :usocket)
> (let* ((sock (usocket:socket-connect "127.0.0.1" 5000))
         (s (usocket:socket-stream sock)))
    (princ "GET / HTTP/1.1" s) (write-crlf s)
    (princ "Host: localhost" s) (write-crlf s)
    (usocket:socket-close sock))

where write-crlf is:

(defun write-crlf (stream)
  (write-char #\return stream)
  (write-char #\linefeed stream))

3.After a few times sending, woo:run returns with zero, that is not expected behavior.

> (woo:run
    (lambda (env)
      (declare (ignore env))
      '(200 (:content-type "text/plain") ("Hello, World"))))
0

4.Run a Woo server again with woo:run

5.Sending the incomplete HTTP request again may cause memory fault.

Cause

It may caused by a write watcher ev-io object not ev-io-inited is to be ev-io-stoped.

A write watcher ev-io object seems to be ev-io-inited in init-write-io, which is called in with-async-writing macro after Woo has received a complete HTTP request to begin generating its response.

Before that, the write watcher ev-io object is not initialized, happen to be ev-io-stoped when Woo gets EOF before receiving a complete HTTP request, causing memory fault on FFI.

Solution

It would help that initializing a write watcher when making a Woo socket in make-socket. I guess it is not needed to delay initializing a write watcher after Woo is ready to start its response and init-write-io is called. I will send a pull request for a patch later.

(defun make-socket (&rest initargs &key tcp-read-cb fd &allow-other-keys)
  (let ((socket (apply #'%make-socket initargs)))
    (lev:ev-io-init (socket-read-watcher socket)
                    tcp-read-cb
                    fd
                    lev:+EV-READ+)
+   (lev:ev-io-init (socket-write-watcher socket)
+                   'async-write-cb
+                   fd
+                   lev:+EV-WRITE+)
    socket))
fukamachi commented 9 years ago

I think this has been fixed by #18.