dougmoscrop / serverless-http

Use your existing middleware framework (e.g. Express, Koa) in AWS Lambda 🎉
Other
1.72k stars 165 forks source link

Sending compressed body to serverless-http through API Gateway #194

Closed peebles closed 3 years ago

peebles commented 3 years ago

I have an application that uploads compressed data through API Gateway to a serverless-http lambda, using node express as the framework. The upload can be reproduced with this curl command line:

cat data.json | gzip |  curl -v -X POST <my-endpoint> --data-binary @- -H "Content-Encoding: gzip" -H content-type:application/json

API Gateway is "smart enough" to recognize the content-encoding header and it does the decompression. I can not for the life of me turn that behavior off! Anyway, the event.body hitting serverless-http lambda function is already decompressed and isBase64Encoded=false. However, the event.headers['Content-Encoding'] is still present and set to "gzip"! When express bodyParser sees this, it attempts to uncompress and errors out with "incorrect header check". (zlib.js:182:17).

What is the proper way of handling this? I am getting around this annoying situation by doing this:

  ...

  if ( typeof event.body === 'string' && event.isBase64Encoded === false) {
    Object.keys(event.headers).forEach( key => {
      if (key.toLowerCase() === 'content-encoding' && event.headers[key] === 'gzip') {
        delete event.headers[key];
      }
    });
  }

  return serverless(app)(event, context);

What a hack! I guess its useful that API Gateway can automatically handle decompression on uploads, but it seems wronf that it doesn't alter the headers to reflect that it was so helpful.

Or ... what in the heck am I doing wrong here?

dougmoscrop commented 3 years ago

It sounds like you're using a REST API, can you switch to HTTP APIs?

I think it is intrinsic behavior in REST: https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-make-request-with-compressed-payload.html

peebles commented 3 years ago

What is the difference between a REST api and an HTTP api? I read your link reference above and it does not say what the Gateway will do with the Content-Encoding header.

I simply have a plain old web server running behind apigateway. My web server checks incoming request headers, and if Content-Encoding is gzip, it will decompress the body. API Gateway sees my request first, decompresses the body, but does not remove or alter the Content-Encoding header. When the request gets to my server, the header says "decompress" but the body is already decompressed, and I have no way of knowing.

dougmoscrop commented 3 years ago

There are a lot of difference between the two - how are you deploying your app?

peebles commented 3 years ago

Can you list one key difference between a REST api and an HTTP api?

As far as deployment: "I have an application that uploads compressed data through API Gateway to a serverless-http lambda, using node express as the framework. The upload can be reproduced with this curl command line:"

In other words, a fairly typical serverless.io using the so-called "lambda proxy" mode, and using "serverless-http" library to wrap a node express server.

peebles commented 3 years ago
service: digitsole

provider:
  name: aws
  runtime: nodejs12.x
  stage: ${env:NODE_ENV}
  region: ${env:AWS_REGION}
  deploymentBucket: ${env:DEPLOY_BUCKET}
  iamManagedPolicies:
    - "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"

functions:
  webhook-server:
    handler: lambda.handler
    events:
      - http: ANY /
      - http: 'ANY {proxy+}'

and

const serverless = require('serverless-http');
let app = require('./server');
module.exports.handler = (event, context) => {
  // API Gateway will automatically decompress incoming body if Content-Encoding is "gzip", but
  // it does not change the header!  So Express attempts to unzip the body and bombs!
  //
  if ( typeof event.body === 'string' && event.isBase64Encoded === false) {
    Object.keys(event.headers).forEach( key => {
      if (key.toLowerCase() === 'content-encoding' && event.headers[key] === 'gzip') {
        delete event.headers[key];
      }
    });
  }
  return serverless(app)(event, context);
}

and server.js:

const bodyParser = require('body-parser');
let app;
module.exports = app = require('express')();

app.use(bodyParser.json({
  strict: false,
  limit: "950kb"
}));

// ROUTES HERE

if (require.main === module) {
  app.listen(PORT, () => {
    app.log.info(`starting server on port ${PORT}`);
  });
}
dougmoscrop commented 3 years ago

Well one key difference is that REST APIs appear, by documentation, to automatically decompress gzip bodies and HTTP don't ;)

The lambda proxy mode is a specific kind of integration for API Gateway REST APIs. Try switching to an API Gateway HTTP API.

https://www.serverless.com/framework/docs/providers/aws/events/http-api/

dougmoscrop commented 3 years ago

Yeah just change your event key (and definition) to httpApi instead of http and see if that works.

peebles commented 3 years ago

I don't see anything in that documentation that says anything about compression. Are you saying that "one key difference is that REST APIs appear, by documentation, to automatically decompress gzip bodies and HTTP don't" comes from AWS documentation? I have not found a document on AWS that says their new "http api" ignores Content-Encoding ... but their doc isn't the most coherent and I could've missing something. I have looked pretty hard though...

dougmoscrop commented 3 years ago

The documentation is bad. In the link I shared above (https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-make-request-with-compressed-payload.html) it says:

When API Gateway receives the request, it verifies if the specified content coding is supported. Then, it attempts to decompress the payload with the specified content coding. If the decompression is successful, it dispatches the request to the integration endpoint.

I can't say for sure how this works as I've never tested it, but it at least sounds similar to what you're experiencing (I would consider it a bug, potentially - I would expect content-encoding to be ignored because that's the content of the resource itself, in HTTP semantics, you're saying "here is a thing, that is gzipped", but transfer-encoding, on the other hand, is "between you and I, I have gzipped this thing", and I could see API gateway doing the auto-unzip... but who can really know what AWS' intentions are?)

Anyway, try changing to the httpApi instead, they're cheaper, tend to be faster, and are more pure/raw directly (more like an ALB than the whole REST version, which has features you probably aren't using? like usage plans, caching, integration request/response templates, etc.)

peebles commented 3 years ago

The "http api" on gateway was something I only recently found out about. I'll give it a shot, when I have some free time. Thanks for the help!

nbcchen commented 1 month ago

Rest API can get gzip data, just pass content-encoding: gzip to the request headers

https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-make-request-with-compressed-payload.html