Open LBPHacker opened 2 years ago
Thankyou for investingating! This indeed looks like it could be the issue.
Are you able to create a test case: working backwards from the changed code, we should be able to reach+test this code by sending a frame of size >125 but less than 65536.
I'll try my best, but as I said, I'm unsure what sort of test one would add for testing this, as the conditions don't seem easily reproducible from Lua. I did at some point try interposing socket.xwrite to introduce artificial throttling and fragmentation, but it didn't end well.
'll try my best, but as I said, I'm unsure what sort of test one would add for testing this, as the conditions don't seem easily reproducible from Lua.
I think it would look similar to https://github.com/daurnimator/lua-http/blob/169c1a7586d39be1bf0d98b69934e8e8b08a87cd/spec/websocket_spec.lua#L246-L262 except without the :close()
on the writing side until after the receive side has completed.
Oh I see: fill
is eager and if you only have half the packet then the timeout of 0 is used.
Okay yeah, we need some way to only send half the message "now", half after a delay. Or to cap :fill
.... thinking....
I'm pretty close to getting something working, will post in a bit.
Got it. The test http.websocket module two sided tests works with medium size frames, even if the connection is really bad
fails without the fix, with the error we know from and love in #140.
Are we in a deadlock here? I'm waiting for a response on the interposition thing, maybe you're also waiting for something?
60-day poke.
Are we in a deadlock here? I'm waiting for a response on the interposition thing, maybe you're also waiting for something?
Sorry, I'm low on free time lately. I don't love the approach taken for the tests here, and my plan is to sit down one night and try to re-work it. Maybe I won't come up with anything better than what you already have; or maybe it will lead to a larger refactor/change.
Alright, noted. Excuse the poking, I was just trying to make sure I hadn't derailed some sort of 'workflow' by leaving a review comment or something.
This is more an educated guess as to why #140 exists combined with some pattern matching, than a well-researched, proof-backed solution. But it does fix a service I maintain that otherwise exhibits the problem described in the aforementioned issue.
Edit: I'm not sure what sort of test one would add for something like this, request for comments on that.
To verify that this indeed fixes what I think* is the problem, I first severely handicapped my
lo
:And ran the following programs (code below):
server.lua
```lua #!/usr/bin/env lua5.3 local http_websocket = require("http.websocket") local http_server = require("http.server") local http_headers = require("http.headers") local cqueues = require("cqueues") local function reply(myserver, stream) local req_headers = assert(stream:get_headers()) local ws = assert(http_websocket.new_from_stream(stream, req_headers)) ws:accept() while true do local data, typ = ws:receive() if not data then break end ws:send(data, typ) end ws:close(1000) end local myserver = assert(http_server.listen({ host = "0.0.0.0", port = 36779, onstream = reply, onerror = function(myserver, context, op, err, errno) local msg = op .. " on " .. tostring(context) .. " failed" if err then msg = msg .. ": " .. tostring(err) end assert(io.stderr:write(msg, "\n")) end, })) assert(myserver:listen()) do local bound_port = select(3, myserver:localname()) assert(io.stderr:write(string.format("Now listening on port %d\n", bound_port))) end assert(myserver:loop()) ```client.lua
```lua #!/usr/bin/env lua5.3 local http_websocket = require("http.websocket") local cqueues = require("cqueues") local errno = require("cqueues.errno") local large_str = ("z"):rep(10000) local client = http_websocket.new_from_uri('ws://localhost:36779') assert(client:connect()) client:send(large_str, "text") local data, typ while true do data, typ, errcode = client:receive(1) if data then break end if errcode ~= errno.ETIMEDOUT then error(typ) end end assert(#data == #large_str) client:close(1000) ```Without the fix applied, I get
and the client never terminates. With the fix applied, I get
*: I see nothing else in the code that might cause the assertion to fail, but I'm unfamiliar with both the websocket RFCs and this codebase, so there may be things I'm missing.