dmccuskey / dmc-websockets

Corona SDK client library for the WebSocket protocol (RFC6455)
MIT License
11 stars 7 forks source link

Race condition in payload transfer #8

Open joagre opened 7 years ago

joagre commented 7 years ago

Hi, I send large amounts of payload over a WebSocket (~60KB) and get code crashes in dmc_websockets/frame.lua, i.e. it is not prepared for the fact that just a part of the payload may have made its way from the server to the client (the rest is on its way). Instead of waiting on the complete payload it crashes.

Excerpt from frame.lua:

readPayloadData = function( frame, bytearray )
    -- print( "readPayloadData", bytearray.pos )

    local bytes = frame.payload_len
    local data

    -- Verify Close frame size
    if frame.opcode == FRAME_TYPE.close and not ( bytes == 0 or bytes >= 2 ) then
        error( ProtocolError{
            code=CLOSE_CODES.PROTO_ERR.code, reason=CLOSE_CODES.PROTO_ERR.code,
            message="Data packet wrong size for Close frame" } )
        return
    end

    --== read rest of data

    if bytes == 0 then
        data = ""
    else
        data = bytearray:readBuf( bytes )
    end

The call to bytearray:readBuf( bytes ) explodes if just a part of the payload has been read from the socket.

bytearray:readBuf( bytes ) above crashes because it calls readUTFBytes( len ) in dmc_lua/lua_bytearray.lua:

function ByteArray:readUTFBytes( len )
    -- print( "ByteArray:readUTFBytes", len, self._pos )
    assert( type(len)=='number', "need integer length" )
    --==--
    if len == 0 then return "" end
    self:_checkAvailable( len )
    local bytes = self.getBytes( self._buf, self._pos, len )
    self._pos = self._pos + len
    return bytes
end

Which fails on self:_checkAvailable( len ) in dmc_lua/lua_bytearray.lua:

function ByteArray:_checkAvailable( len )
    -- print( "ByteArray:_checkAvailable", len )
    if len > self.bytesAvailable then
        error( Error.BufferError("Read surpasses buffer size") )
    end
end

As far as I can see this happens often for large payloads but I have seen it happen in other situations as well. This seems like a fragile race condition and a serious producer-consumer bug?

This said. I may misunderstand something essential in your code.

Kind regards /Joakim

joagre commented 7 years ago

I nearly killed myself figuring this out. :-)

I think this was the problem:

spock:Corona jocke$ svn diff dmc_corona/dmc_websockets.lua 
Index: dmc_corona/dmc_websockets.lua
===================================================================
--- dmc_corona/dmc_websockets.lua   (revision 519)
+++ dmc_corona/dmc_websockets.lua   (working copy)
@@ -487,7 +487,7 @@
    local _, e_pos = ba:search( '\r\n\r\n' )
    if e_pos == nil then return end

-   ba.pos = 1
+   ba._pos = 1
    local h_str = ba:readBuf( e_pos )

    -- process header
@@ -616,7 +616,7 @@
    local err = nil
    repeat

-       local position = self._ba.pos -- save in case of errors
+       local position = self._ba._pos -- save in case of errors
        try{
            function()
                handleWSFrame( ws_frame.receiveFrame( self._ba ) )
@@ -626,7 +626,7 @@
                                         err=e
                                         -- if added by jocke
                                         if self._ba ~= nil then
-                                            self._ba.pos = position
+                                            self._ba._pos = position
                                         end
                end
            }
@@ -1163,7 +1163,7 @@
            ba:writeBuf( data ) -- copy in new data

            -- if LOCAL_DEBUG then
-           --  print( 'Data', #data, ba:getAvailable(), ba.pos )
+           --  print( 'Data', #data, ba:getAvailable(), ba._pos )
            --  Utils.hexDump( data )
            -- end
dmccuskey commented 7 years ago

wow, well done tracking that down ! 👍 i'll bet that was a challenging journey for sure since you had to explore several Lua modules. :)

i really appreciate your contribution.

cheers, David

ps, possible to create a PR ?