creationix / moonslice-luv

A port of moonslice running on top of luv and lhttp_parser
39 stars 4 forks source link

stack overflow :( #1

Closed dvv closed 11 years ago

dvv commented 12 years ago
$ test-web.lua &
$ wrk -c1000 -r1000000 -t2 http://localhost:8080/

server crashes:

luajit: stack overflow
stack traceback:
        [C]: in function 'insert'
        ./web.lua:91: in function 'res'
        ./autoheaders.lua:55: in function 'res'
        test-web.lua:10: in function 'app'
        ./autoheaders.lua:7: in function 'app'
        ./web.lua:85: in function <./web.lua:79>
        [C]: in function 'execute'
        ./web.lua:146: in function 'reader'
        ./continuable.lua:66: in function 'processReaders'
        ./continuable.lua:87: in function <./continuable.lua:85>
        [C]: in function 'run'
        test-web.lua:30: in main chunk
        [C]: ?
dvv commented 12 years ago

64420 requests at most till overflow

dvv commented 12 years ago

it stays alive if run by plain lua, but it certainly leaks, as htop shows

dvv commented 12 years ago

https://github.com/wg/wrk -- fast http perf testing tool

dvv commented 12 years ago

so the trace states we over-inserting head table in response

creationix commented 11 years ago

I'm working on this finally. I can't seem to find where the stack is leaking frames.

dvv commented 11 years ago

so do i. may be the event loop is not tail-recursive?

creationix commented 11 years ago

I traced the queues in continuable, and they stay constant max length. I logged a stack trace on every request right before the exception happens, and the stack length is constant too (about 10 calls deep).

dvv commented 11 years ago

but you are able to replicate the issue, right?

miko commented 11 years ago

I am able to replicate with luajit 2.0.0, but for lua 5.1.5 I get different error message: TODO: Implement async error handling lua: luv_stream.c:30: luv_on_read: Assertion `0' failed. I think luajit hijacks the original error message and fires its own somehow. Or does not service the error correctly, putting something on its internal stack. Or handles things different, like calling GC at a later stage than lua. Anyways, the original message is in file luv/luv_stream.c function luv_on_read(). Just a guess, but maybe for many threads in luajit the free(buf.base) is never called when assertion happens, and that buffer lives somewhere on the stack before the thread gets GCed? Edit: using archlinux 32-bit

dvv commented 11 years ago

notice, that we don't stop reading at all in current master. i tried to cope with that at https://github.com/dvv/luv-2/commit/4cdecc4713e99af6e3b817b631c497efeaf63599#L3R135 you could find another places we miss the stopper.

miko commented 11 years ago

I have got an answer from Mike Pall:

I know the original (luv/luv_stream.c) code can be buggy, but still the error returned by luajit is different (and not that helpful).

Well, you're most certainly overflowing the Lua stack if you get that message. Lua only limits the frame depth, whereas LuaJIT limits the total number of stack slots (IMHO a more suitable measure). If it's not the frame depth, then it must be the frame size. Most likely some C function is pushing results on the stack, but they are not popped later on. Try printing lua_gettop() at strategic points in the code.

creationix commented 11 years ago

Oh, that stack! That's easy to fix... On Nov 16, 2012 9:19 AM, "miko" notifications@github.com wrote:

I have got an answer from Mike Pall:

I know the original (luv/luv_stream.c) code can be buggy, but still the error returned by luajit is different (and not that helpful).

Well, you're most certainly overflowing the Lua stack if you get that message. Lua only limits the frame depth, whereas LuaJIT limits the total number of stack slots (IMHO a more suitable measure). If it's not the frame depth, then it must be the frame size. Most likely some C function is pushing results on the stack, but they are not popped later on. Try printing lua_gettop() at strategic points in the code.

— Reply to this email directly or view it on GitHubhttps://github.com/creationix/moonslice-luv/issues/1#issuecomment-10450152.

creationix commented 11 years ago

Fixed in https://github.com/creationix/luv/commit/789eea64ef9b9d5b4a9aea4b73e6bca41e7e88cf

dvv commented 11 years ago

i can't see any improvement. I added -DLUV_CHECK_STACK=1 to Makefile but the issue remained.

creationix commented 11 years ago

Are you sure you're using the updated luv version? Also what server are you running. I was using a modified version of test-web.lua that skipped the middlewares and manually set the content length in the main app.

dvv commented 11 years ago

All's well now.

dvv commented 11 years ago

I posted some comments to you at IRC -- please consider answering

miko commented 11 years ago

So do non-keep-alive requests (like "wrk -H 'Connection: close' ") work for you? It does not work for me, and for keep-alive, it crashes after the last request.

dvv commented 11 years ago

apply last chunks of http://busybox.net/~dvv/luv-1.patch -- for some reason @creationix haven't applied them to the core.

miko commented 11 years ago

It is better now, but still Connection: close requests break the test-web.lua example:

# client
$ telnet 0 8080
GET / HTTP/1.0

#server
luajit: ./web.lua:125: attempt to index field 'inputQueue' (a nil value)
stack traceback:
    ./web.lua:125: in function <./web.lua:124>
    [C]: in function 'execute'
    ./web.lua:146: in function 'reader'
    ./continuable.lua:69: in function 'processReaders'
    ./continuable.lua:90: in function <./continuable.lua:88>
    [C]: in function 'run'
    test-web.lua:31: in main chunk
    [C]: at 0x0804c280
dvv commented 11 years ago

confirmed. afaics we respond with HTTP/1.1 hardcoded. will look into this

dvv commented 11 years ago

https://github.com/creationix/moonslice-luv/commit/7f878e3b5a72b0bcc3cd6eec7cc1a70f1e369a8f#commitcomment-2173869 but still can't see how to cope with https://github.com/creationix/moonslice-luv/commit/7f878e3b5a72b0bcc3cd6eec7cc1a70f1e369a8f#L12R136 -- no tcp module, unclean objective, and a friendly server should shutdown connection, not close it abruptly. @creationix please fix that