dougmoscrop / serverless-http

Use your existing middleware framework (e.g. Express, Koa) in AWS Lambda 🎉
Other
1.74k stars 167 forks source link

Function callback is never fired #71

Closed ihnat-spurit closed 5 years ago

ihnat-spurit commented 5 years ago

Hello,

We are using Express app, which is decorated by serverless-http in order to transform the route to AWS lambda function handler. Overall it works OK, but I noticed that sometimes, when we get near 50 requests batch sent to the API in very short period of time (almost in parallel), some of them are failed with HTTP 500.

It seems like some containers are started but the route callback is never called. At least I've made this conclusion analyzing CloudWatch logs. The function is not finished, because the handler (the route) was not called, and, after some time, it is killed with the timeout.

Has anyone got similar problems? Any insights or advice would be appreciated.

dougmoscrop commented 5 years ago

What is your API doing? Like if it's for example talking to Dynamo, it could easily be receiving Capacity Exhaustion rejections and retrying; the default retry configuration in the AWS SDK is extremely high.

If you can come up with some minimal reproducible example I am happy to take a closer look.

ihnat-spurit commented 5 years ago

Thanks for your reply first of all. We actually write the messages into SQS. I was able to reproduce the issue even with 50 webhooks sent to the API in very short period of time. Do you think we could get into the Capacity Exhaustion rejections and retrying?

Please see the simplified code of the API endpoint below:

'use strict';
console.log('init: ' + new Date());

const serverless = require('serverless-http');
const app = require('express')();
const AWS = require('aws-sdk');
const sqs = new AWS.SQS({region : process.env.DEPLOY_AWS_REGION});

app.post('/', function (req, res, next) {
    try{
        console.log('received: ' + new Date());
        //... getting body, queue url and webhook id
        sqs.sendMessage({
            MessageBody: body,
            QueueUrl: queueUrl,
            MessageGroupId: webhookId,
            MessageDeduplicationId: webhookId
        }, function(err){
            if(err) {
                return next(new Error("The webhook was not pushed to the queue: " + err.message));
            }else{
                console.log('produced: ' + new Date());
                res.send('');
            }
        });
    }catch(e){
        return next(new Error(e.message));
    }
});

module.exports.handler = serverless(app);

So basically I can see the "init" log message, but the "received" one never happens for the function. Then in 6 seconds the function is killed because of timeout.

Please let me know if you need more info from my side. Thanks again.

dougmoscrop commented 5 years ago

Thanks, I will take a look.

DlearyMW commented 5 years ago

We're also running into this. It's a lot more intermittent for us though. Some days it happens, other days it doesn't. In our case, we're not making any asynchronous calls. It also does not seem like it's related to concurrent requests given that it's happening on our staging environment, usually around times of low traffic.

dougmoscrop commented 5 years ago

What version of express @DlearyMW and @ihnat-spurit ?

dougmoscrop commented 5 years ago

Also if you could share any code Dleary that would be good. Strip it down how you need but I'd like to deploy and debug

DlearyMW commented 5 years ago

Sure thing, though, like I said for us it's very intermittent:

Node Runtime: 8.10 Express version: 4.16.4 Serverless-http: 1.9.1

Example timeout: image

Sample Code:

const express = require('express')
const cookieParser = require('cookie-parser')
const lambdaLog = require('lambda-log')

const app = express()
app.use(cookieParser())
app.get('/placeholder', getAppContainerHtml)

module.exports.handler = serverless(app)

function getAppContainerHtml (req, res) {
  lambdaLog.options.debug = true
  lambdaLog.options.meta.cookies = req.cookies
  lambdaLog.debug('Retrieving appContainerPlaceholderHtml')

  const appContainerPlaceholderHTML = 'Some random string'

  if (appContainerPlaceholderHTML && appContainerPlaceholderHTML.length) {
    lambdaLog.debug('Successfully retrieved appContainerPlaceholderHtml', {
      appContainerPlaceholderHtml: appContainerPlaceholderHTML
    })
  } else {
    lambdaLog.warn('Received empty string instead of appContainerPlaceholder', {
      appContainerPlaceholderHtml: appContainerPlaceholderHTML
    })
  }

  return res.status(200).send(appContainerPlaceholderHTML)
}
dougmoscrop commented 5 years ago

What log level was on? Would that debug('successfully..') statement have been printed if it was reached?

DlearyMW commented 5 years ago

Correct, 2 lines above that debug log is the configuration that turns on the debug log level

dougmoscrop commented 5 years ago

d'oh, yeah, I saw that -- ok, ran some tests, no luck so far, I will try to keep at it in my spare time

DlearyMW commented 5 years ago

For what it's worth, since moving away from using this module, we haven't had anymore of those 6 second timeouts we were seeing sporadically.

dougmoscrop commented 5 years ago

Yeah, I was going to ask if you'd tried aws-serverless-express. Conversely, we're using this in half a dozen different apps, and aren't having timeouts either :/

I would like to get to the bottom of it. What version of cookie-parser were you using as well?

dsappet commented 5 years ago

Hi @dougmoscrop, I work with DlearyMW on this project, we are using "cookie-parser": "1.4.4"

dougmoscrop commented 5 years ago

thanks! I will pin that version as I test.

also three Dougs in a thread 😲

dsappet commented 5 years ago

Yeah! Pretty awesome!

dougmoscrop commented 5 years ago

So, this library no longer uses the callback at all - it returns a promise (async function to be specific)

I deployed the integration test suite and ran 100,000 requests (with a concurrency of 500) to each of the koa and express test endpoints, and 100 to the binary test endpoint, not a single one hung. I then took your sample code above and deployed it and hit it 2 x 100,000 times, one with no cookie one with a cookie of 'Testing=abcdef123456' and no hangs, p99 time was ~210ms, longest was 1.9s. (Just ran another 2x 100,000 tests with different longer cookie strings and multiple cookies, same thing).

I don't know if it might have been a bug in lambdaLog or earlier code in this lib, but at this point I think I am going to regretfully close this issue, if anyone else sees timeouts please try to contribute examples that are as minimal as possible and we can reopen.

dsappet commented 5 years ago

Thanks for the followup on this issue and your testing. We've moved away from this implementation so we won't be able to test in a production level environment. We will keep this in mind for future needs / implementations though.