fukamachi / woo

A fast non-blocking HTTP server on top of libev
http://ultra.wikia.com/wiki/Woo_(kaiju)
MIT License
1.27k stars 96 forks source link

Can't catch some errors thrown from worker threads. #64

Open heegaiximephoomeeghahyaiseekh opened 7 years ago

heegaiximephoomeeghahyaiseekh commented 7 years ago

One common error that happens is when a client opens a connection but never sends a request. Eventually, that produces a timeout error, which bubbles up to the debugger. But that's hard to reproduce. Here's an easier one, generated by simply sending invalid output:

(woo:run (lambda (env)
                 `(200 (:content-type "application/octet-stream")
                      (#(1 2 3 4 5))))
    :num-workers 2)

The FAST-HTTP.ERROR:CB-MESSAGE-COMPLETE this throws isn't catchable anywhere because the error happens in a worker thread. It has nowhere to go but the debugger. This is pretty convenient for development, but a deal-killer for production code.

The woo:run function could accept an error-handling function that the worker thread would install with HANDLER-BIND, like this:

(defun make-worker (process-fn when-died error-handler)
  (let* ((dequeue-async (cffi:foreign-alloc '(:struct lev:ev-async)))
         (stop-async (cffi:foreign-alloc '(:struct lev:ev-async)))
         (worker (%make-worker :dequeue-async dequeue-async
                               :stop-async stop-async
                               :process-fn process-fn))
         (worker-lock (bt:make-lock)))
    (lev:ev-async-init dequeue-async 'worker-dequeue)
    (lev:ev-async-init stop-async 'worker-stop)
    (setf (worker-thread worker)
          (bt:make-thread
           (lambda ()
             (tagbody
                begin
                (restart-case
                    (handler-bind ((t error-handler))
                      (bt:acquire-lock worker-lock)
                      (let ((*worker* worker))
                        (wev:with-sockaddr
                          (unwind-protect                  
                               (wev:with-event-loop ()
                                 (setf (worker-evloop worker) *evloop*)
                                 (bt:release-lock worker-lock)
                                 (lev:ev-async-start *evloop* dequeue-async)
                                 (lev:ev-async-start *evloop* stop-async))
                            (unless (eq (worker-status worker) :stopping)
                              (vom:debug "[~D] Worker has died" (worker-id worker))
                              (funcall when-died worker))
                            (finalize-worker worker)
                            (vom:debug "[~D] Bye." (worker-id worker))))))
                  (abort-worker-thread () :report "Abort the Woo worker")
                  (restart-worker () :report "Restart the worker"
                                  (go begin)))))
           :initial-bindings (default-thread-bindings)
           :name "woo-worker"))
    (sleep 0.1)
    (bt:acquire-lock worker-lock)
    worker))

Then, the app's error handler could invoke the abort-worker-thread restart to abort the worker thread, or restart-worker to start the worker up again.

heegaiximephoomeeghahyaiseekh commented 7 years ago

It turns out that you don't have to be using :num-workers to be unable to catch an error. It is also possible to be unable to catch errors in single-threaded operation.