SocketCluster / socketcluster

Highly scalable realtime pub/sub and RPC framework
https://socketcluster.io
MIT License
6.14k stars 313 forks source link

Memory leak in SimpleExchange channels #412

Closed marcj closed 6 years ago

marcj commented 6 years ago

I've got a simple question which drives me crazy.

When a client connects to the worker I use client's (on the worker.js) exchange object to subscribe to a channel.

scServer.on('connection', function (socket) {
    //there might be multiple sockets subscribing to this channel
    const channel = socket.exchange.subscribe('my-channel');
    channel.watch((data) => {
    });

    //we do this manually as the client will subscribe to multiple channels over time
    channel.unsubscribe();

    //is this necessary?
    channel.destroy();
});

Simple question: Is channel.destroy() necessary? It feels to me, this call disables that channel for ALL connected/subscribed clients. It seems to me, channel.unsubscribe is also not correctly used here, because when multiple client subscribe to the same channel, when one client unsubscribe all get unsubscribed (at least from the current worker as I understood, right?).

When I look at the code at https://github.com/SocketCluster/sc-simple-broker/blob/master/index.js#L101, it looks to me exchange.subscribe() reuses already created SChannel instances (which is not clear in the documentation). Once I call channel.destroy all clients loose their subscription, however I have to call it because it will never be GC as it is in SimpleExchange._channels basically forever until someone calls destroy() on the Channel or exchange.destroyChannel(channelName), which will not be called automatically as I understood this. Is this correct? If so, then I think this is a memory leak.

jondubois commented 6 years ago

@marcj If you keep the same socket/exchange object around forever then you should call channel.destroy() when you're no longer using a channel or else it will stay in memory (so if you create a lot of unique channels and never destroy them then yes it will lead to a memory leak).

You don't need to destroy the individual channels if you call socket.destroy() on the socket itself; destroying the socket will also destroy all the channels that are attached to it.

SC will give you back the reference to an existing channel if such a channel already exists on the socket/exchange. Each channel is unique throughout the socket/exchange. So if you destroy a channel, it will affect all watchers for that socket/exchange (shouldn't affect other sockets/exchanges though).

The reason why channels are reused is because the SC socket needs to keep track of channels in order to pass data to them from the server so it needs to have a reference to the channel object. Allowing multiple channel objects per socket (for the same channel name) would add overhead and complexity inside SC and would lead to having multiple sources of truths for what is essentially the same resource.

marcj commented 6 years ago

I see, thanks for the detailed explanation.

If you keep the same socket/exchange object around forever then you should call channel.destroy() when you're no longer using a channel

How do I detect if there are no subscriber anymore on a particular channel? I mean the memory leak happens in the Broker, which can consist of multiple broker instances. I'd need to connect to each broker and check there, or how do I get rid of those unused channel objects? Rule of thumb is probably to never call channel.destroy() as you can't be sure whether all clients to that Broker have unsubscribed, right?

You don't need to destroy the individual channels if you call socket.destroy() on the socket itself; destroying the socket will also destroy all the channels that are attached to it.

Oh, are you sure about that? Because destroying a channel (channel.destroy()) means all clients currently subscribed to it will automatically unsubscribe as well. Ergo, when I have 2 clients subscribing to channel test and one client disconnects, the channel('test') will be destroyed, resulting in a unsubscribtion (for this particular client it makes sense) and all of the other clients as well. This would be definitely unexpected and I would consider this as a bug. Or are you talking about a different technical implementation of 'destroying a channel upon client disconnect'?

The reason why channels are reused is because the SC socket needs to keep track of channels in order to pass data to them from the server so it needs to have a reference to the channel object

I definitely see the reason behind that. I'd like to contribute to the website and clear some things up with more detailed explanation. Can you point me to a howto/description on how to contribute here?

Also, are you available for consulting? We're currently about to build a rather complex app using SocketCluster and would like to ask some questions. If so I'd love to write you an email/chat via skype/whatsapp/slack/whatever fits you best.

jondubois commented 6 years ago

@marcj

Oh, are you sure about that? Because destroying a channel (channel.destroy()) means all clients currently subscribed to it will automatically unsubscribe as well.

The channel object is bound to a single client so destroying it only subscribes and destroys it on that client; it doesn't affect other clients. Note that the exchange object is also a client and it's independent from all other sockets in the system; the exchange is basically a socket which can be used on the backend, is always connected and has slightly different behaviour from other kinds of sockets in the system.

Note that on the front end, if you call let socket = socketCluster.create(...) or let socket = socketCluster.connect(...), multiple times, by default, if the options are the same (e.g. same target host, port, query parameters, ...) then it will reuse the existing socket instead of creating a completely new one. If you want to force the creation of a new socket every time, then you need to pass a multiplex: false parameter to the create function. Not sure if the current behaviour is the ideal default behaviour anymore so feedback is welcome.

If you want to contribute to the website documentation, it's here: https://github.com/SocketCluster/socketcluster-website

marcj commented 6 years ago

The channel object is bound to a single client so destroying it only subscribes and destroys it on that client; it doesn't affect other clients

When we talk about the client yes. Sorry, I was talking about the server side, where you have clientSocket.exchange from the scServer.on('connection', (clientSocket) => {}) event, which I thought is always a new instance per each new incoming connection. So I got confused that destroying a channel returned from clientSocket.exchange.subscribe() resulted in a total unsubscribe for all connected clients. Now that I know exchange is basically a own client per worker I understand why this happens. Thanks for clarification.

So, I assume the SimpleExchange is instantiated per worker. If so we could probably check on each client 'disconnect' event whether a channel has watchers open, if not then automatically delete that channel object to get rid of the memory leak. WYDT?

jondubois commented 6 years ago

So, I assume the SimpleExchange is instantiated per worker. If so we could probably check on each client 'disconnect' event whether a channel has watchers open, if not then automatically delete that channel object to get rid of the memory leak.

Yes I guess there are many ways to do this. Whenever you create channels on the exchange on the backend, you need a strategy like this to manage them.