fukamachi / woo

A fast non-blocking HTTP server on top of libev
http://ultra.wikia.com/wiki/Woo_(kaiju)
MIT License
1.28k stars 98 forks source link

woo does not read whole query before parsing on big upload #48

Closed knobo closed 8 years ago

knobo commented 8 years ago

This bug is only on Linux and not OS X.

Use program from this bugreport to reproduce. https://github.com/fukamachi/fast-http/issues/27

knobo commented 8 years ago

Now I have tried this on FreeBSD on a virtual machine on my laptop, and I have the same problem there. And FreeBSD uses the kqueue backend for libev as OS X does.

knobo commented 8 years ago

I tried with lev:+EVBACKEND-SELECT+ still same problem. Not sure if I used it in the right place. EDIT: And now I also tried lev:+EVBACKEND-POLL+. Still the same.

;; File:  woo-20160318-git/src/ev/event-loop.lisp
(defmacro with-event-loop ((&key enable-fork cleanup-fn) &body body)
  `(let ((*evloop* (lev:ev-loop-new (+  lev:+EVBACKEND-SELECT+
                    (if ,enable-fork
                        lev:+EVFLAG-FORKCHECK+
                        0))))
         (*callbacks* (make-hash-table :test 'eql))
         (*data-registry* (make-hash-table :test 'eql))
         (*input-buffer* (make-static-vector *buffer-size*)))
     (unwind-protect (progn
                       ,@body
                       (lev:ev-run *evloop* 0))
       (let ((close-socket-fn (intern #.(string :close-socket) (find-package #.(string :woo.ev.socket)))))
         (maphash (lambda (fd socket)
                    (declare (ignore fd))
                    (funcall close-socket-fn socket))
                  *data-registry*))
       ,@(when cleanup-fn
           `((funcall ,cleanup-fn)))
       (free-static-vector *input-buffer*)
       (cffi:foreign-free *evloop*))))
knobo commented 8 years ago

I believe the fix is something like this:

--- woo-20160318-git/src/ev/tcp.lisp    2016-05-25 11:27:28.727523260 +0200
+++ woo-20160318-git/src/ev/tcp.lisp.fix    2016-05-25 11:27:26.799516499 +0200
@@ -110,7 +110,6 @@
            (return))
           (otherwise
            (setf (socket-last-activity socket) (lev:ev-now *evloop*))
-      ;; Please check if there is more to read here, before continuing!
            (when read-cb
              (funcall (the function read-cb) socket *input-buffer* :start 0 :end n))
            (unless (= n buffer-len)
knobo commented 8 years ago

This improves the situation, But does not fix it. I can now upload buffer-size big file. So If I increase the buffer size, I can upload more. It also helps on https://github.com/fukamachi/woo/issues/47

    (loop
       (progn
     (map-into *input-buffer* (constantly 0)) ;; lazy-coding clear buffer.
     (let ((n (wsys:read fd (static-vectors:static-vector-pointer *input-buffer*) buffer-len)))
      (declare (dynamic-extent n))
      (case n
fukamachi commented 8 years ago

Hmm... It seems the request body is broken when the request is chunked and has a large body. I could reproduce it even with Mac.

knobo commented 8 years ago

Good to know it wasn't just me and my code that was broken ;) Is this something you intend to fix?

fukamachi commented 8 years ago

Probably needs more time to fix it. I still don't get what the exact reason to reproduce it. It doesn't occur when the request body is a big ASCII string.

knobo commented 8 years ago

I wonder, what is the mechanism for handling a request that is bigger then woo.ev.event-loop:*buffer-size*. When the parser is called with the first part of the requests, the parser stops halfway somewhere inside a request, and then the rest of the request is read with wsys:read, and the parser is calld again, then how does the parser know that it is suppose to continue parsing a header value or a body value? Is it a state that is saved in the parser somewhere?

fukamachi commented 8 years ago

Sorry for keep you waiting. I suppose d589524de4c1bcb50bef520df89d96f1ca983a34 has fixed this issue and it's already in the master branch. Please check out the latest Woo.

knobo commented 8 years ago

It woks better now yes. But not every time. I open up my upload-file test project, and play with some large values, and sometimes (by random) I get this error. This time I uploaded 1 MB, and just clicked repeatedly on the input box, and suddenly the error appears. It is a different issue I guess.

Callback Error: the body callback failed The bounding indices 0 and 869656 are bad for a sequence of length 65536. [Condition of type FAST-HTTP.ERROR:CB-BODY]

Restarts: 0: [RETRY] Retry SLIME REPL evaluation request. 1: [*ABORT] Return to SLIME's top level. 2: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {1004E30033}>)

Backtrace: 0: ((FLET #:H0 :IN FAST-HTTP.PARSER::READ-BODY-DATA) #) 1: (SIGNAL #<SB-KERNEL:BOUNDING-INDICES-BAD-ERROR expected-type: (CONS (INTEGER 0 65536) (INTEGER 0 65536)) datum: (0 . 869656)>) 2: (ERROR SB-KERNEL:BOUNDING-INDICES-BAD-ERROR :DATUM (0 . 869656) :EXPECTED-TYPE (CONS (INTEGER 0 65536) (INTEGER 0 65536)) :OBJECT #(120 120 120 120 120 120 ...)) 3: (SB-INT:SEQUENCE-BOUNDING-INDICES-BAD-ERROR #(120 120 120 120 120 120 ...) 0 869656) 4: (SB-KERNEL:VECTOR-SUBSEQ* # # #)

knobo commented 8 years ago

Here is a bit more detailed debug information. Is there anything more I can do to provide more information? Are you able to reproduce this error?

Looks like this INDICES-BAD-ERROR occurs right at the new code.

  0: (SB-INT:SEQUENCE-BOUNDING-INDICES-BAD-ERROR #(120 120 120 120 120 120 ...) 0 9901384)
  1: (SB-KERNEL:VECTOR-SUBSEQ* #<unavailable argument> #<unavailable argument> #<unavailable argument>)
  2: ((LAMBDA (WOO::DATA WOO::START WOO::END) :IN WOO::SETUP-PARSER) #(120 120 120 120 120 120 ...) 0 9901384)
      Locals:
        SB-DEBUG::ARG-0 = #(120 120 120 120 120 120 ...)
        SB-DEBUG::ARG-1 = 0
        SB-DEBUG::ARG-2 = 9901384
  3: (FAST-HTTP.PARSER::READ-BODY-DATA #S(FAST-HTTP.HTTP:HTTP-REQUEST :METHOD :POST :MAJOR-VERSION 1 :MINOR-VERSION 1 :STATUS 0 :CONTENT-LENGTH 0 :CHUNKED-P NIL ...) #S(FAST-HTTP.PARSER:CALLBACKS :MESSAGE-..
      Locals:
        BODY-END = 9901384
        #:CALLBACK0 = #<CLOSURE (LAMBDA (FAST-HTTP.HTTP:HTTP FAST-HTTP::DATA FAST-HTTP::START FAST-HTTP::END) :IN FAST-HTTP:MAKE-PARSER) {10082F540B}>
        CALLBACKS = #S(FAST-HTTP.PARSER:CALLBACKS ..)
        DATA = #(120 120 120 120 120 120 ...)
        END = 4294967295
        HTTP = #S(FAST-HTTP.HTTP:HTTP-REQUEST :METHOD :POST :MAJOR-VERSION 1 :MINOR-VERSION 1 :STATUS 0 :CONTENT-LENGTH 0 :CHUNKED-P NIL ...)
        READABLE-COUNT = 4294967295
        START = 0
knobo commented 8 years ago

Why aren't the requests the same size every time?

;;; Changed code too look inside the request
(defun read-body-data (http callbacks data start end)
  (declare (type http http)
           (type simple-byte-vector data)
           (type pointer start end))
  (let ((readable-count (the pointer (- end start))))
    (declare (dynamic-extent readable-count)
             (type pointer readable-count))
    (if (<= (http-content-length http) readable-count)
        (let ((body-end (+ start (http-content-length http))))
          (declare (dynamic-extent body-end))
      (warn "body length: ~a ~a ~a ~a" body-end start (http-content-length http) readable-count)
...
...
...

Request size: 32499. I get no error from this. I get the output (One line for every click):

WARNING: body length: 460 0 460 460
WARNING: body length: 460 0 460 460
WARNING: body length: 460 0 460 460
WARNING: body length: 463 0 463 463
WARNING: body length: 33231 518 32713 32713
WARNING: body length: 463 0 463 463
WARNING: body length: 460 0 460 460

But when request size is larger (1000000): This time the error came on the second upload.

Woo server is going to start.
Listening on localhost:5050.
WARNING: body length: 17694 0 17694 17694
WARNING: body length: 902430 0 902430 4294967295

EDIT: The end value is unreasonable large, don't you think?

knobo commented 8 years ago

Is there something wrong with signed/unsigned value of N in WOO.EV.TCP::TCP-READ-CB ?

  0: (FAST-HTTP.PARSER::READ-BODY-DATA #S(FAST-HTTP.HTTP:HTTP-REQUEST :METHOD :POST :MAJOR-VERSION 1 :MINOR-VERSION 1 :STATUS 0 :CONTENT-LENGTH 771355 :CHUNKED-P NIL ...) #S(FAST-HTTP.PARSER:CALLBACKS :MES..
  1: (FAST-HTTP.PARSER::PARSE-BODY #S(FAST-HTTP.HTTP:HTTP-REQUEST :METHOD :POST :MAJOR-VERSION 1 :MINOR-VERSION 1 :STATUS 0 :CONTENT-LENGTH 771355 :CHUNKED-P NIL ...) #S(FAST-HTTP.PARSER:CALLBACKS :MESSAGE..
  2: (FAST-HTTP.PARSER:PARSE-REQUEST #S(FAST-HTTP.HTTP:HTTP-REQUEST :METHOD :POST :MAJOR-VERSION 1 :MINOR-VERSION 1 :STATUS 0 :CONTENT-LENGTH 771355 :CHUNKED-P NIL ...) #S(FAST-HTTP.PARSER:CALLBACKS :MESSA..
  3: ((LAMBDA (FAST-HTTP::DATA &KEY :START :END) :IN FAST-HTTP:MAKE-PARSER) #(120 120 120 120 120 120 ...) :START #<unavailable argument> :END #<unavailable argument>)
  4: (WOO::READ-CB #S(WOO.EV.SOCKET:SOCKET :WATCHERS #(#.(SB-SYS:INT-SAP #X7FFFE400BA00) #.(SB-SYS:INT-SAP #X7FFFE400BA40) #.(SB-SYS:INT-SAP #X7FFFE400C1F0)) :LAST-ACTIVITY 1.4661329746566706d9 :FD 61 :REM..
      Locals:
        #:.DEFAULTING-TEMP. = 0
        #:.DEFAULTING-TEMP.#1 = 4294967295
        DATA = #(120 120 120 120 120 120 ...)
        #:N-SUPPLIED-274 = 1
        SOCKET = #S(WOO.EV.SOCKET:SOCKET ..)
  5: (WOO.EV.TCP::TCP-READ-CB #<unused argument> #.(SB-SYS:INT-SAP #X7FFFE400BA00) #<unused argument>)
      Locals:
        BUFFER-LEN = 65536
        FD = 61
        N = 4294967295
        READ-CB = #<FUNCTION WOO::READ-CB>
        SOCKET = #S(WOO.EV.SOCKET:SOCKET ..)
        WATCHER = #.(SB-SYS:INT-SAP #X7FFFE400BA00)
knobo commented 8 years ago

When I change read to return a ssize-t

(defcfun ("read") ssize-t
  (fd :int)
  (buf :pointer)
  (count :unsigned-int))

I sometimes get:

<ERROR> [05:20:41] woo.ev.tcp - Unexpected error (Code: 11)

http://www-numi.fnal.gov/offline_software/srt_public_context/WebDocs/Errors/unix_system_errors.html

Says 11 means "Try again". Is that right?

knobo commented 8 years ago

Looks like this fixed it: ` ;;; woo/src/syscall/main.lisp (defconstant EAGAIN 11.)

`

(define-c-callback tcp-read-cb :void ((evloop :pointer) (watcher :pointer) (events :int))
  (declare (ignore evloop events))
  (let* ((fd (io-fd watcher))
         (buffer-len (length *input-buffer*))
         (socket (deref-data-from-pointer fd))
         (read-cb (socket-read-cb socket)))
    (loop
      (let ((n (wsys:read fd (static-vectors:static-vector-pointer *input-buffer*) buffer-len)))
        (declare (dynamic-extent n))
        (case n
          (-1
           (let ((errno (wsys:errno)))
             (cond
               ((or (= errno wsys:EWOULDBLOCK)
                    (= errno wsys:EINTR)))
               ((or (= errno wsys:ECONNABORTED)
                    (= errno wsys:ECONNREFUSED)
                    (= errno wsys:ECONNRESET))
                (vom:error "Connection is already closed (Code: ~D)" errno)
                (close-socket socket))
           ((= errno wsys::eagain)
        (warn "retrying"))
               (t
                (vom:error "Unexpected error (Code: ~D)" errno)
                (close-socket socket))))
           (return))
          (0
           ;; EOF
knobo commented 8 years ago
diff --git a/src/ev/tcp.lisp b/src/ev/tcp.lisp
index 7a7c654..7d96a39 100644
--- a/src/ev/tcp.lisp
+++ b/src/ev/tcp.lisp
@@ -99,6 +99,9 @@
                     (= errno wsys:ECONNRESET))
                 (vom:error "Connection is already closed (Code: ~D)" errno)
                 (close-socket socket))
+              ((= errno wsys::eagain)
+               ;; Jus to nothing
+               )
                (t
                 (vom:error "Unexpected error (Code: ~D)" errno)
                 (close-socket socket))))
diff --git a/src/syscall/main.lisp b/src/syscall/main.lisp
index 35f59a8..6387e68 100644
--- a/src/syscall/main.lisp
+++ b/src/syscall/main.lisp
@@ -17,7 +17,7 @@
   (buf :pointer)
   (count size-t))

-(defcfun ("read") :unsigned-int
+(defcfun ("read") ssize-t
   (fd :int)
   (buf :pointer)
   (count :unsigned-int))
@@ -51,6 +51,8 @@
 (defconstant ECONNREFUSED 61.)
 (defconstant ECONNRESET 54.)
 (defconstant ENOTCONN 57.)
+(defconstant EAGAIN 11.)
+

 (defun set-fd-nonblock (fd enabled)
   (declare (optimize (speed 3) (safety 0)))