I have a tricky bug that currently is making it a nightmare to maintain a site we launched a month or two ago. We did several tests with multiple users simulating a real environment, but didn't catch this.
The problem is that after some time the now server stops answering and needs to be restarted, a way of seeing this is doing a curl on the now.js file:
curl http://example.com:3300/nowjs/now.js
curl: (56) Recv failure: Connection reset by peer
Okay so this is the symptom we are seeing. It seems that the problem is related data we are storing in the main js where we create the server connection etc. It requires a file that returns an instantiated JavaScript object that we use for two things.
It holds data about our users a link between their connection to new and their user profile in a CMS.
It provides some util functions to add and remove users from the list when they connect/disconnect. We use setTimeout to handle the disconnect that is cleared if the user connects within a set time limit. The setTimeout ids is also stored on this object.
After usage for some hours by about 25-100 users problems start to arise. It differs a lot how long time is needed to trigger this problem that deadlocks the server. Restarting the server only seem to partly solve the problem, as it usually more quickly will fall back into deadlock state. If we however replace the mentioned object (we have a function for that) the server seems to stabilize completely. The site open and closes the now.js functionality which is used to manage a help desk and open time is for about 4 hours. If the site admins clears the server data minutes before opening, it can sometimes last the 4 hour open time, but not always.
We have checked the server to see if memory leeks could cause the problem, but we're not seeing any spikes or anything else alarming.
I know this is not something you can just reproduce and tell me how to fix, I've tried many things to try to figure out what could be wrong. It's possible we messed up ourselves or maybe we have found a rare edge case.
Any hints, ideas suggestions would be most welcome.
I have a tricky bug that currently is making it a nightmare to maintain a site we launched a month or two ago. We did several tests with multiple users simulating a real environment, but didn't catch this.
The problem is that after some time the now server stops answering and needs to be restarted, a way of seeing this is doing a curl on the now.js file:
Earlier I got
Okay so this is the symptom we are seeing. It seems that the problem is related data we are storing in the main js where we create the server connection etc. It requires a file that returns an instantiated JavaScript object that we use for two things.
It holds data about our users a link between their connection to new and their user profile in a CMS. It provides some util functions to add and remove users from the list when they connect/disconnect. We use setTimeout to handle the disconnect that is cleared if the user connects within a set time limit. The setTimeout ids is also stored on this object.
After usage for some hours by about 25-100 users problems start to arise. It differs a lot how long time is needed to trigger this problem that deadlocks the server. Restarting the server only seem to partly solve the problem, as it usually more quickly will fall back into deadlock state. If we however replace the mentioned object (we have a function for that) the server seems to stabilize completely. The site open and closes the now.js functionality which is used to manage a help desk and open time is for about 4 hours. If the site admins clears the server data minutes before opening, it can sometimes last the 4 hour open time, but not always.
We have checked the server to see if memory leeks could cause the problem, but we're not seeing any spikes or anything else alarming.
I know this is not something you can just reproduce and tell me how to fix, I've tried many things to try to figure out what could be wrong. It's possible we messed up ourselves or maybe we have found a rare edge case.
Any hints, ideas suggestions would be most welcome.