SocketCluster / socketcluster

Highly scalable realtime pub/sub and RPC framework
https://socketcluster.io
MIT License
6.14k stars 313 forks source link

Strange error #100

Closed voxyapi-demo closed 8 years ago

voxyapi-demo commented 9 years ago

...

waltergms commented 9 years ago

Sorry i was using the wrong profile...

Hi Jonathan! I have applied the database to controll all the online visibility between users as we talk back, now im getting this errors when i run the app...

1439928974153 - Origin: Worker (PID 5220) [Notice] Error: Socket Error: Client pong timed out at Emitter.SCSocket._onSCClose (X:\node_modules\socketcluster\lib\server\scsocket.js:173:17) at null. (X:\node_modules\socketcluster\lib\server\scsocket.js:144:10) at Timer.listOnTimeout (timers.js:110:15)

and after some time

1439929225921 - Origin: Broker (PID 4936) [Error] Error: This socket is closed. at Socket._writeGeneric (net.js:656:19) at Socket._write (net.js:709:8) at doWrite (_stream_writable.js:301:12) at clearBuffer (_stream_writable.js:411:7) at onwrite (_stream_writable.js:344:7) at Socket.WritableState.onwrite (_stream_writable.js:105:5) at WriteWrap.afterWrite (net.js:787:12) 1439929228725 - Worker 2 exited - Exit code: 0

Some tips?

jondubois commented 9 years ago

@waltergms Did this issue happen in production?

Notices like the first one can happen in production if a client's computer goes to sleep for example (their connection could timeout without closing the connection properly) - In this case this is normal and you should expect some of these while running in production if you have lots of users. If this issue happened during testing, I would like to know what steps you used so that I can reproduce it.

The second one (Error) means that you are trying to write to a socket (which you have a reference to) after it has been disconnected. If you listen to socket.on('disconnect', handler) and make sure that you don't try to write to the socket again after this then you shouldn't get the error - If you do then that's a bug that needs to be fixed.

If you want to be really safe, you can check if (socket.state == socket.OPEN) before emitting events to a socket.

Let me know if this helps.

waltergms commented 9 years ago

I will make some changes in order to treat these potential openings that may be sending to sockets that have already been closed, but I have some fear that these closed sockets are receiving the events that are published by "socket.global.publish" because I always use this method to send a message to a specific user or to a group (room) users. On the client side, the event "socket.disconnect" I always do a loop in the channels the user is subscribed and remove them with the command "channel[x].unsubscribe(userRooms [y])" and that channel is an instance of the channel (channel [x] = socket .subscribe (userRooms [x]). Taking advantage to take a doubt ... there is another method of sending an event to a particular user without using global.publish?

jondubois commented 9 years ago

@waltergms Yes, you can keep track of socketIds and users (e.g. keep track of them using an Object), but this approach is likely to become a problem as soon as you have to add a second worker or host - I would generally recommend against it but it depends on the use case.

You're not using socket.emit() from the server side at all? If not then I will investigate socket.global.publish to try to identify if there is an issue there.

waltergms commented 9 years ago

Im using socket.emit only in cases that i have to return something to the same person that call the server for some reason, eg.: "make a socket.emit to the server to get some info for myself".. all the communications between user to user OR server to users im using global.publish

jondubois commented 9 years ago

Interesting. In theory it may be possible that a client could disconnect just before the server sends a response to the event from the client.

jondubois commented 9 years ago

@waltergms I tried different things like turning off my connection (and intentionally causing pong timeout) before doing a publish and also just before responding to a client event but I wasn't able to reproduce the second error.

If you're getting the second error in production, it shouldn't affect your users - Basically, what is happening is that the SC server is trying to send a message to a socket which has already been closed.

If this error is triggered by SC internally (for example by global.publish(...)), then that's something I would like to fix - Can you find a way to reproduce this issue in a simple environment or it only happens in production?

Also, what is your client socketCluster.version? And your server version?

waltergms commented 9 years ago

This happens on production only... Im making some code changes, hopefully i will get some better understanding of what is causing the problems... about the versions used, i belive im using the latest... "socketcluster": "2.3.x", "socketcluster-client": "2.3.x",

jondubois commented 9 years ago

Do you have any code running in the broker.js file?

waltergms commented 9 years ago

No.. just the default code..

module.exports.run = function (broker) { console.log(' >> Broker PID:', process.pid); };

waltergms commented 9 years ago

In my test environment im getting disconnected after some seconds... :|

jondubois commented 9 years ago

Did you change the pingInterval or pingTimeout options?

waltergms commented 9 years ago

Yes... var socketCluster = new SocketCluster({ workers: Number(argv.w) || 4, brokers: Number(argv.b) || 1, port: Number(argv.p) || 8090, appName: argv.n || null, workerController: dirname + '/worker.js', brokerController: dirname + '/broker.js', socketChannelLimit: 50, rebootWorkerOnCrash: argv['auto-reboot'] != false, pingTimeout: 5000, pingInterval: 2000 });

jondubois commented 9 years ago

That looks OK

waltergms commented 9 years ago

using that ping options i need to put in the code that scheme of pinging the client time to time?

waltergms commented 9 years ago
var interval = setInterval(function () {
  socket.emit('rand', {
    rand: Math.floor(Math.random() * 5)
  });
}, 1000);
jondubois commented 9 years ago

You don't have to ping explicitly - SC handles the pinging for you.

waltergms commented 9 years ago

What probably could cause the client disconnection?

jondubois commented 9 years ago

So long as the pingInterval is less than half of the pingTimeout it should be fine.

jondubois commented 9 years ago

I noticed in your original error that it's coming from a broker process... Also it looks like it's causing the worker to restart - I think this is different than the issue I was thinking about.

waltergms commented 9 years ago

Looks like the disconnection problem has been fixed... :+1:

jondubois commented 9 years ago

What did you do?

waltergms commented 9 years ago

yes... the worker is restarting... but i don't have implemented nothing on brokers...

waltergms commented 9 years ago

Just restart the server... in my environment...

jondubois commented 9 years ago

Maybe a socket was trying to reconnect while you were restarting it in your test env... Not sure.

jondubois commented 9 years ago

Regarding the original issue (in production) - You should check CPU and memory consumption of all processes to see if maybe there is a memory leak or another issue which is causing the OS to kill a process.

jondubois commented 9 years ago

I need to go now. Let me know if you can reproduce it or if you find the cause.

waltergms commented 9 years ago

1440015569109 - Origin: Worker (PID 7580) [Error] Error: Failed to unbind socket from io cluster - nData Error - The u nsubscribe action timed out at X:\node_modules\socketcluster\lib\server\scserver.js:235:32 at bound (domain.js:254:14) at runBound (domain.js:267:12) at X:\node_modules\socketcluster\node_modules\iocluster\index.js:608:5 at X:\node_modules\socketcluster\node_modules\iocluster\node_modules\async\l ib\async.js:428:21 at X:\node_modules\socketcluster\node_modules\iocluster\index.js:725:19 at EventEmitter. (X:\node_modules\socketcluster\node_modules\iocl uster\node_modules\async\lib\async.js:428:21) at X:\node_modules\socketcluster\node_modules\iocluster\index.js:786:26 at cb (X:\node_modules\socketcluster\node_modules\iocluster\node_modules\nda ta\index.js:320:26) at bound (domain.js:254:14) at runBound (domain.js:267:12) at null._onTimeout (X:\node_modules\socketcluster\node_modules\iocluster\nod e_modules\ndata\index.js:247:11) at Timer.listOnTimeout (timers.js:110:15) 1440015569828 - Origin: Broker (PID 3392) [Error] Error: write EPIPE at exports._errnoException (util.js:746:11) at Socket._writeGeneric (net.js:690:26) at Socket._write (net.js:709:8) at doWrite (_stream_writable.js:301:12) at writeOrBuffer (_stream_writable.js:288:5) at Socket.Writable.write (_stream_writable.js:217:11) at Socket.write (net.js:634:40) at EventEmitter.self.write (X:\node_modules\socketcluster\node_modules\ioclu ster\node_modules\ndata\node_modules\ncom\index.js:82:17) at send (X:\node_modules\socketcluster\node_modules\iocluster\node_modules\n data\server.js:64:10) at Object.actions.subscribe (X:\node_modules\socketcluster\node_modules\iocl uster\node_modules\ndata\server.js:345:5) 1440015569828 - Origin: Broker (PID 3392) [Error] Error: This socket is closed. at Socket._writeGeneric (net.js:656:19) at Socket._write (net.js:709:8) at doWrite (_stream_writable.js:301:12) at writeOrBuffer (_stream_writable.js:288:5) at Socket.Writable.write (_stream_writable.js:217:11) at Socket.write (net.js:634:40) at EventEmitter.self.write (X:\node_modules\socketcluster\node_modules\ioclu ster\node_modules\ndata\node_modules\ncom\index.js:82:17) at send (X:\node_modules\socketcluster\node_modules\iocluster\node_modules\n data\server.js:64:10) at Object.actions.subscribe (X:\node_modules\socketcluster\node_modules\iocl uster\node_modules\ndata\server.js:345:5) at Socket. (X:\node_modules\socketcluster\node_modules\iocluster\ node_modules\ndata\server.js:406:34) 1440015569828 - Origin: Broker (PID 3392) [Error] Error: This socket is closed. at Socket._writeGeneric (net.js:656:19) at Socket._write (net.js:709:8) at doWrite (_stream_writable.js:301:12) at writeOrBuffer (_stream_writable.js:288:5) at Socket.Writable.write (_stream_writable.js:217:11) at Socket.write (net.js:634:40) at EventEmitter.self.write (X:\node_modules\socketcluster\node_modules\ioclu ster\node_modules\ndata\node_modules\ncom\index.js:82:17) at send (X:\node_modules\socketcluster\node_modules\iocluster\node_modules\n data\server.js:64:10) at Object.actions.subscribe (X:\node_modules\socketcluster\node_modules\iocl uster\node_modules\ndata\server.js:345:5) at Socket. (X:\node_modules\socketcluster\node_modules\iocluster\ node_modules\ndata\server.js:406:34)

The scenary is: 24k channels 1.3k users online

Eventually the client receive a socket hangup, i don't know why the connection is losing the communication with the server...

Uncaught Error: Socket hung up [ socketcluster.js:685 ]

jondubois commented 9 years ago

@waltergms How many broker processes do you have and how busy are they? Can you see using the top command? You should try to keep them below 60% CPU usage in case of spikes.

That looks like something which might happen if a broker gets so busy that it can't keep up relaying channel data between your workers.

Your app seems to make heavy use of channels so you should try to boost the number of brokers. When you launch your instance, you can pass the -b flag: E.g. node server -b 2 (this will launch 2 brokers) - The more brokers you have, the less work each one will have to do.

jondubois commented 9 years ago

I will try to simulate a similar scenario (with lots of channels) later today to see if I can reproduce the issue.

jondubois commented 9 years ago

@waltergms Ok, I was able to reproduce the exact same errors by blocking the nData server (broker process) for 10 seconds using a while loop while a client is trying to unsubscribe. This suggests that it's probably due to a broker maxing out on CPU usage.

waltergms commented 9 years ago

Good morning Jonathan, I'm working vacation starting today, my colleague will make the necessary adjustments, it should continue testing and to contact you today. Thank you very much for your attention, i hope we can get this working 100%

welberty commented 9 years ago

Good morning Jonathan, I am a colleague of Walter. I made several adjustments and the server is behaving right now, the only problem I still have is the memory consumption. I believe that a garbage collector solve, but when I try to use the "global.gc ()", an exception is thrown and random way.

How do I use a garbage collector with socketCluster?

jondubois commented 9 years ago

@welberty You need to provide a special --expose-gc command line argument to worker processes in order to be able to invoke global.gc().

This wasn't possible before so I fixed this in version 2.3.13. I created a new issue in case anyone else is wondering about the same thing: https://github.com/SocketCluster/socketcluster/issues/105 Workers and brokers will now inherit any CLI execArgv from the master process so you can also have --nouse-idle-notification and/or --harmony flags or any other valid flag.

Thanks for reporting this issue.

welberty commented 9 years ago

Thank's for your attention. With the update of the socketCluster global.gc () worked fine.

welberty commented 9 years ago

I have another problem. I use a table in the database to store the socket IDs of users, so that I know which users are online. In evenrto disconnect socket remove this table and put the User and offline. It turns out that due to some errors Evente disconnect the socket is not raised, below one of reccorentes errors.:

Tue Sep 01 2015 11:38:31 GMT-0300 (E. South America Standard Time): Error: Socket Error: Socket hung up at Emitter.SCSocket._onSCClose (Y:\node_modules\socketcluster\lib\server\scsocket.js:173:17) at WebSocket. (Y:\node_modules\socketcluster\lib\server\scsocket.js:44:10) at WebSocket.emit (events.js:110:17) at WebSocket.cleanupWebsocketResources (Y:\node_modules\socketcluster\node_modules\ws\lib\WebSocket.js:926:10) at Socket.emit (events.js:129:20) at net.js:459:14 at process._tickDomainCallback (node.js:381:11)

jondubois commented 9 years ago

@welberty This error is normal and you should expect it to happen fairly often if your system has lots of users - It's a socket error so it won't crash the worker (just log it, but don't worry too much about it unless you get a really massive number of these - Then it might mean that the internet connection between your server and your clients is bad for example).

It just means that a socket was closed suddenly without first sending a close control frame (maybe the user's computer/browser/internet crashed/froze suddenly). You should still get a 'disconnect' event on the server's socket object though.

To reproduce the 'Socket hung up' error, I have the client running in Node.js in the command line and I connect it to a SC server. Then I just kill the client abruptly using Ctrl+C - This triggers a 'Socket hung up' error on the server as expected because the client did not send a close packet. In my case, I did get a 'disconnect' event on the server socket object.

Can you find out what is triggering this error in your case and double check that the 'disconnect' event is not triggered on the socket object on the server?

Note that sometimes, if the hung up is due to a bad internet connection, the 'disconnect' event might trigger only after a ping timeout (20 seconds) is reached but the hung-up event should appear at the same time on the server.

safiresh commented 8 years ago

Hai Team, I am new for socketcluster. Server SC version : 3.0.0 Client SC version : 3.1.1

i got this below this issue 1452426873250 - Origin: Worker (PID 2641) [Notice] Error: Socket Error: Client pong timed out at Emitter.SCSocket._onSCClose (/Users/safi/Projects/Server/hm-chat/node_modules/socketcluster-server/scsocket.js:174:17) at null. (/Users/safi/Projects/Server/hm-chat/node_modules/socketcluster-server/scsocket.js:145:10) at Timer.listOnTimeout (timers.js:92:15)

kindly suggest to me

jondubois commented 8 years ago

@safiresh Try to use a client and server with the same version number.