Closed BarryCarlyon closed 6 years ago
@BarryCarlyon Are you seeing these errors after a long period of time has passed? Can you share what version of the SDK/node.js you are using?
In the SO example, it looks like sqs.receiveMessage
gets called on an interval of 500 ms. My initial thought is that enough of these requests get created that overtime a tipping point is reached where there are just too many sockets in use.
If the above is the case, there are a couple ways to solve this problem. One would be to configure the SDK to use a maximum number of sockets.
var sqs = new AWS.SQS({
httpOptions: {
agent: new https.Agent({
maxSockets: 50 // number chosen arbitrarily
})
}
});
The maxSockets approach won't solve any issues you might be having with high memory usage though, as requests could queue up while waiting for a free socket.
The other thing you could do is wait for sqs.receiveMessage
to return before calling it again. You would essentially just make sqs.receiveMessage
call itself inside of the callback. You can call it within a setTimeout
so that it's called on the next tick, and so that the stack size doesn't grow too large. This method would have the biggest positive impact on your memory usage, and you can still have multiple sqs.receiveMessage
requests going in parallel.
I'm looping every 1/2 a second but it only SQS fetches IF there isn't a SQS fetch running. So there shouldn't be any hanging sockets, or sockets running concurrently (well aside from calls to message delete but the running
flag should handle that.
I'm now on 2.3.16 (previously unsure), and seeing the same issue, my watchdog timer last caught and force reset at "Wed Jun 01 2016 11:27:07 GMT+0100"
Yes it's after the job has been running for $some_time say a few days or so.
I can't say I've seen high CPU/Memory usage. New Relic hasn't caught anything abnormal (as it's monitoring the server).
So in summary, I should be waiting for the current sqs.receiveMessage
to finish before I call it again.
Code block follows:
var running = false;
runMonitorJob = setInterval(function() {
if (running) {
// do nothing
} else {
running = true;
clearTimeout(watchdogTimeout);
watchdogTimeout = setTimeout(function() {
console.log('WatchDog');
running = false;
}, 120000);
sqs.receiveMessage({
QueueUrl: queueUrl,
MaxNumberOfMessages: 10,
WaitTimeSeconds: 20
}, function(err, data) {
if (err) {
logger.fatal('Error on Message Recieve');
logger.fatal(err);
} else {
// all good
if (undefined === data.Messages) {
logger.info('No Messages Object');
timeCheck();
} else if (data.Messages.length > 0) {
logger.info('Messages Count: ' + data.Messages.length);
var delete_batch = new Array();
for (var x=0;x<data.Messages.length;x++) {
// process
receiveMessage(data.Messages[x]);
// flag to delete
var pck = new Array();
pck['Id'] = data.Messages[x].MessageId;
pck['ReceiptHandle'] = data.Messages[x].ReceiptHandle;
delete_batch.push(pck);
}
if (delete_batch.length > 0) {
logger.info('Calling Delete');
sqs.deleteMessageBatch({
Entries: delete_batch,
QueueUrl: queueUrl
}, function(err, data) {
if (err) {
logger.fatal('Failed to delete messages');
logger.fatal(err);
} else {
logger.debug('Deleted recieved ok');
}
});
}
} else {
logger.info('No Messages Count');
}
}
running = false;
});
}
}, 500);
logger is a call to log4js, and receiveMessage basically dumps off to Redis which I munch on later.
If I was to add the maxSockets
what would I log to detect if I hit maxSockets? (Would it chuck a error somewhere?)
Any update on this?
Closing old issues. If you're still encountering this issue, please open a new issue and reference this one.
Occasionally after periods of running fine, Node Amazon SQS will just "stop" and not issue the "no messages" after the timeout.
The issue is outlined here in this StackOverflow: http://stackoverflow.com/questions/37111431/amazon-sqs-with-aws-sdk-receivemessage-stall
Basically in summary: