NFIBrokerage / slipstream

A slick WebSocket client for Phoenix Channels
https://hex.pm/packages/slipstream
Apache License 2.0
155 stars 18 forks source link

Memory issue on join #52

Closed cadebward closed 1 year ago

cadebward commented 1 year ago

Hello!

Thanks for the library, we've been really enjoying it.

Recently we noticed our server taking up a lot of memory and we tracked it down to Slipstream when joining lots of channels concurrently. When we put a delay between the joins, the memory goes back down to nearly nothing.

I put together a quick phx app to reproduce the situation here https://github.com/cadebward/slipstream_memory.

Run that app as you normally would and open up whatever you use to monitor memory. I'm running MacOS and I see memory go straight up to 20Gb within a few moments of the server being online.

Here's the point of interest: https://github.com/cadebward/slipstream_memory/blob/master/lib/slip/room_server.ex#L43-L51

the-mikedavis commented 1 year ago

Thanks for the report - the reproduction case was very helpful!

It looks like this is an issue in the way Slipstream handles telemetry for joins and rejoins. I think it should be possible for this to happen with connections and reconnections as well. The Slipstream.Socket struct contains the telemetry metadata and the telemetry metadata contains a prior version of the socket per join attempt. When we start new connections or try to join channels, we add a new version of the socket to that metadata per join attempt and we would only clear that metadata after successfully joining the topic. (Note that in the reproduction case, the joins never succeed because the room:<n> topic is unmatched.)

This leads the socket struct to grow in size really sharply (and infinitely, if the topics never join successfully), especially if you are trying to join multiple topics. I think based on some napkin math that the memory grows something like on the order of R^T where R is the number of retries and T is the number of topics. So even with just one topic that fails to join repeatedly, it can become a noticeable memory leak over enough time. I still see huge memory consumption even with the Process.send_after/4 change in the reproduction case - it just takes longer to grow since T is smaller for longer. (I might not be seeing the same behavior as you though so let me know if my patch doesn't fix it!)

I'll push and release a fix for this that trims the socket metadata so that the socket and metadata don't reference each other. That should prevent memory growth based on the number of retries. With the fix, I see the memory in the reproduction case fluctuating normally rather than constantly increasing.

Thanks again for the report!

the-mikedavis commented 1 year ago

I pushed https://github.com/NFIBrokerage/slipstream/commit/7583a044d38972bb0249b841d315203cc615dd30 and a v1.0.2 release with that fix. It should pop up on Hex pretty soon

cadebward commented 1 year ago

I put it back to how it was and the memory is staying consistently low. Thanks so much for looking into this!