Flotype / now

NowJS makes it easy to build real-time web apps using JavaScript
http://www.nowjs.com
MIT License
1.91k stars 175 forks source link

now.js server deadlocks after usage. #145

Open googletorp opened 13 years ago

googletorp commented 13 years ago

I have a tricky bug that currently is making it a nightmare to maintain a site we launched a month or two ago. We did several tests with multiple users simulating a real environment, but didn't catch this.

The problem is that after some time the now server stops answering and needs to be restarted, a way of seeing this is doing a curl on the now.js file:

curl http://example.com:3300/nowjs/now.js
curl: (56) Recv failure: Connection reset by peer

Earlier I got

curl http://example.com:3300/nowjs/now.js
curl: (55) Send failure: Broken pipe

Okay so this is the symptom we are seeing. It seems that the problem is related data we are storing in the main js where we create the server connection etc. It requires a file that returns an instantiated JavaScript object that we use for two things.

It holds data about our users a link between their connection to new and their user profile in a CMS. It provides some util functions to add and remove users from the list when they connect/disconnect. We use setTimeout to handle the disconnect that is cleared if the user connects within a set time limit. The setTimeout ids is also stored on this object.

After usage for some hours by about 25-100 users problems start to arise. It differs a lot how long time is needed to trigger this problem that deadlocks the server. Restarting the server only seem to partly solve the problem, as it usually more quickly will fall back into deadlock state. If we however replace the mentioned object (we have a function for that) the server seems to stabilize completely. The site open and closes the now.js functionality which is used to manage a help desk and open time is for about 4 hours. If the site admins clears the server data minutes before opening, it can sometimes last the 4 hour open time, but not always.

We have checked the server to see if memory leeks could cause the problem, but we're not seeing any spikes or anything else alarming.

I know this is not something you can just reproduce and tell me how to fix, I've tried many things to try to figure out what could be wrong. It's possible we messed up ourselves or maybe we have found a rare edge case.

Any hints, ideas suggestions would be most welcome.

ericz commented 13 years ago

Hi googletorp,

This is a serious issue and we're looking into it now.

googletorp commented 13 years ago

Hi ericz,

Much appreciated, let me know if you need more detail.