fdimuccio / play2-sockjs

A SockJS server implementation for Play Framework.
Apache License 2.0
62 stars 11 forks source link

Websocket message losses #21

Closed TheChifer closed 7 years ago

TheChifer commented 7 years ago

After upgrading to Play2.5.x & play2-sockjs 0.5.0/0.5.1 I am seeing message losses. SockJS setup

public SockJSSettings settings() {
        return  new SockJSSettings()
            .withWebsocket(true)
            .withHeartbeat(FiniteDuration.create(config.getInt(ConfigHelper.WEBSOCKET_HEARTBEAT), TimeUnit.MILLISECONDS))    
            .withStreamingQuota(4*1024);
    }

With v2.5. upgrade I can see sockjs is buffering multiple messages emitted by the linked Actor, hence I see larger array frames sent to the client Eg an 'a' frame with 8 array elements with the size of 21846

these large wrapped frames misses messages, for example about 3 messages in the middle are missing, it should be a frame having 11 message elements. Not all wrapping frames misses messages but happens often enough.

From server logs I can see the actor is firing messages correctly to the sockjs router actor & from browser network tab can see the loss.

SockJS client is white listed only to use websocket transport

with v2.4 I see smaller frames of 4k with a single message frames. 2.4 is NO loss and very stable.

fdimuccio commented 7 years ago

Please can you upload a sample project that reproduces the bug you have found, so that I can track it down.

TheChifer commented 7 years ago

SockJSTest.zip

http://localhost:9000/ Hit enter on the input 100 4k data('a's) are returned via ws with a sequence number Eg :1 -- [data] 2 -- [data] etc

Checkout large frames with multiple data, you could see the loss very easily

Let me know if you want more info, its quite a bad loss causing issues in prod

TheChifer commented 7 years ago

If I use streams, I don't see intermittent losses. but there is a max message cap of 271 what ever the size of the message, subsequent messages beyond 271 is lost.

fdimuccio commented 7 years ago

Thank you, I checked it and it works as intended. When using actor based api or LegacySockJS (as in your case) you can't control backpressure, so if the underlying buffer fills up, the new elements are automatically discarded (and that is why you are seeing message loss).

However you can control the buffer size by specifying it in the settings:

    @Override
    public SockJSSettings settings() {
        return  new SockJSSettings()
            .withWebsocket(true)
            .withSessionBufferSize(512*1024) // by default is 64k
            .withStreamingQuota(4096);
    }

By using those settings in your sample app you will not see any loss (probably you need to tune it according to your use case).

Up to play2-sockjs 0.4.x, the underlying implementation was not stream based and the buffer was unbounded (so that's why you were not seeing any message loss), but that's bad because you don't have any control over memory usage.

TheChifer commented 7 years ago

Thanks, I did play with both session and send buffer, however did have issues even with 512k. I will check again.

  1. Will send buffer have an impact too
  2. Are the buffers dedicated memory or dynamic limits, if I have 1M buffer how would it impact overall scalability?
  3. Is the 271 cap I am seeing when using stream implementation same issue,but here 271 is fixed even individual message is few bytes or 8k, happy to upload a sample
fdimuccio commented 7 years ago
  1. The send buffer is not used for WebSocket protocols: it holds the messages sent by the client to the send endpoint (xhr_send and jsonp_send).
  2. The send buffer can be fixed or dynamic according to ActorMaterializerSettings.maxFixedBufferSize. The session buffer is only dynamic and the limit is a soft one, ie: it doesn't truncate the message if it exceed the size (but it will not accept further messages until the one it contains are processed).
  3. If you upload the sample I can have a better understanding on what's going on.

When dimensioning the session buffer you have to consider the rate at which you emit data and the client consumes it, that's why the value depends on the use case.