dherault / serverless-offline

Emulate AWS λ and API Gateway locally when developing your Serverless project
MIT License
5.19k stars 795 forks source link

Upload image issue #464

Closed dfloresgonz closed 4 months ago

dfloresgonz commented 6 years ago

My image seems to be corrupted when uploaded to server, It only works with .txt files.

I'm using sls 1.27, EspressJS and multiparty Is there a way to upload images without any problems?

Thanks.

function uploadImage(req, res) {
    let multiparty = require('multiparty');
    let fs = require('fs');
    let form = new multiparty.Form();
    form.parse(req, function(err, fields, files) {
        console.log(files);
        //uploading file...
        })).then(result => {
            res.status(200).send({ msj : 'Images were uploaded'});
        }).catch(err => {
            res.status(500).send(err);
        });
    });
}
const serverless = require('serverless-http');
const express = require('express')
const app = express()
require('./middlewares/authenticated');

var bodyParser = require('body-parser');
var cors       = require('cors');

app.use(cors({'origin' : '*'}));
app.use(bodyParser.urlencoded({ extended : false }) )
   .use(bodyParser.json());
service: my-backend

provider:
  name: aws
  runtime: nodejs6.10
  stage: prod
  region: us-east-1

plugins:
  - serverless-offline
  - serverless-s3-local

functions:
  app:
    handler: index.handler
    events:
      - http: ANY /
      - http: 'ANY {proxy+}'
  upload:
    handler: index.handler
    events:
      - http: 'POST /api/registro/uploadImage/'
evangow commented 6 years ago

@dfloresgonz did you ever get this working? I'm running into the same issue.

dfloresgonz commented 6 years ago

@evangow I read that sls offline doesnt support binary types so it will never work, but only offline. I used base64 to make it work on local env.

evangow commented 6 years ago

@dfloresgonz Are you converting it to base64 on the client side, then sending it to the server? Or, are you using something like the the gist linked below to parse the event and pass it onto serverless-http?

Would you happen to have a code sample or gist you could share if you're doing it on the server side? I think I could probably figure it out on the client side, but I've been banging my head against the wall for hours looking through issues trying to sort this out.

Parser Gist: https://gist.github.com/lteacher/9ef1c7bc5908418b30a18719521ff3c7#file-parsers-js-L12-L41 ^ Found in this issue: https://github.com/dherault/serverless-offline/issues/230

janwirth commented 5 years ago

same here using

curl -X POST -F 'file=@/home/jan/figub/pass.mp3;type=audio/mpeg' localhost:3333/upload -H "Content-Type: audio/mpeg" --verbose -H "Accept: audio/mpeg"
janwirth commented 5 years ago

With both serverless-offline-python and serverless-offline on a nodejs environment the body contains payload which is ~30% larger than the original file. A diff between the binary files showed perfect identity up until two thirds - where in the resulting file a bunch of data is appended.

malnor commented 5 years ago

Same here, cannot get it working, we looked through all the other code time after time without understanding what was happening. It would be good to at least throw something if it is not supported - took us more than a day to find this problem.

dherault commented 5 years ago

We would gladly accept a PR on this. It shouldn't be much, just a Hapijs tweaking.

janwirth commented 5 years ago

FYI - I chose a different architecture. The lambda now just requests a temporary URL from my file bucket. The file bucket can be hidden behind an internal (transparent) redirect.

arnas commented 5 years ago

I have encountered this issue and after extensive investigation, I found that https://github.com/dherault/serverless-offline/blob/master/src/index.js#L490 causes the issue.

If I am trying to upload an image via multipart/form-data it is converted toString('binary') ( which seems is deprecated). Then it's passed to express (in my case), the request goes through https://www.npmjs.com/package/multer, which is decoded incorrectly and the image becomes corrupted. But if I commented out that toString call, express with its middleware correctly handles the raw buffer.

It was added with https://github.com/dherault/serverless-offline/pull/394 as a workaround? As there behavior that binary data was converted to a utf8 string.

At least from initial testing, it seems that serverless offline works fine without 490 line. Maybe @dherault and @lteacher @daniel-cottone could comment if it's still required?

lteacher commented 5 years ago

If the line is removed then i assume it would work, because the line does some magic check to ensure the toString('utf8') doesnt happen for multipart/form-data. So if it didn't ever do that the encoding detect thing wouldn't be needed.

Actually I haven't been using serverless for 2 years but we still have a service running that parses multi-part form data, though it was using an old branch.... I did switch to the latest serverless-offline just now and found that it worked ok still.

So regarding the above question, @arnas I also removed the line to see and as expected it worked just fine for me. That wasn't too surprising though because the check added was because of the toString('utf8') and that was to fix another issue so I have no idea if that issue would be a problem when removing that line. See #224

I also spent a bit of time checking out what the whole problem is around here and it was painful... I guess at the end of the day the issue seems to come down to the encoding (since I wasn't able to get an exact answer with code links to provide). So probably for most of the above, its something like... an issue because the parsing magic is done on pipe and the encoding has been set to binary string and its not a buffer when it comes to your logic.

For sure its just the above referenced line that converts to string, so if its removed and it doesn't break whatever that old issue was then it would be ok. Otherwise you could try setting the encoding, there is some check that happens for if it is an actual buffer so it would probably still work both online and in offline...

So summary try locally setting the encoding... In this gist the write function is called and given the encoding. In the examples from above the magical middleware you are all using is doing it in the pipe so try maybe setting req.setEncoding('binary') before passing it to the parse... which is a random guess since i didn't test it.

arnas commented 5 years ago

Well, it seems that this issue only appears for images, my colleague has tested and for example, uploading .txt works fine. It's a bit strange but from my finding, only images are affected.

@lteacher I have checked the pr #224 and at least form the first look seems that toString() is unnecessary as to prevent hapis.js from parsing payload you can simply set payload to parse to false. And I am assuming if hapi parse is set to false function toString() should do nothing as payload already is a string.

lteacher commented 5 years ago

@arnas sorry I dont get exactly what you mean with the last sentence.

Problem History

Fixing in this repo

So to resolve the issue based on what you said above then the best thing is to not do any toString and remove this logic but you might need to fix that issue from #224 some other way, maybe with that parse option or whatever

Fixing original author issues (assuming the fix is not done here in rep)

Also for original author if using wrapping packages with this kind of thing like the file data etc you need to check closely everywhere on the packages you are using like this doc over here. As I recall (but might have changed over last 2 years) you can only get binary content to AWS lambda functions in base64 anyway (but of course even the base64 string is destroyed if it is encoded incorrectly to utf8).

@arnas I assume that you have success already on AWS and that you use something like serverless-http.

arnas commented 5 years ago

@lteacher I have update the wording of my previous comment last sentence.

For fixing repo I believe that would be the best way. I am hoping that removing toString while hapi payload parse is to false will be enough. I will look into it if I have time.

@lteacher I am using serverless-http, but I havent deployed it to aws as I amasuming it will work without much problem as atleast from documentation aws api gateway supports that https://aws.amazon.com/about-aws/whats-new/2016/11/binary-data-now-supported-by-api-gateway/ .

lteacher commented 5 years ago

@arnas oh well should be np if using that serverless-http

Actually I was thinking of this again for some reason and I had another look and noticed that the parse option is actually used already over here. However, I actually didn't notice before when i added that detectEncoding but that payload is just needed as a string for the JSON.parse in there. There is some createLambdaProxyContext which uses the payload but someone haxed something in to not parse if its not a string there (using rawPayload).

Theres a velocity template method too seems to assume the payload is json so dunno about that since sometimes its not already, otherwise could be able to just even move that stringifying stuff into the relevant scope where its needed which is like if it needs to be parsed per last reference?

arnas commented 5 years ago

@lteacher I have a bit of debugging and from the first look it seems that this issues is not only from serverless-offline side. Basically I have created plain handler.js which uses serverless-http to transfer requests to express and multer to deal with multipart data. https://gist.github.com/arnas/d8fed4b78ff2940a3b390e754090cea9
Without changing any dependencies I have confirmed that txt files were encoded and decoded correctly, but there was an issue with png files. They have grown about 30 percentage bigger. (I think somebody mentioned that).
Also, I have confirmed that if I encode buffer and decode it afterward via Buffer(binaryString, 'binary') everything is fine. But as we are giving serverless-http string it uses Buffer.from(binaryString, 'utf-8') and corrupts file unless it was a text file :/ https://github.com/dougmoscrop/serverless-http/blob/c70957b47ac66e363c7179a6e0a4cd8c6d78a2ee/lib/request.js#L15.

tl;dr the bug happens because serverless-http wrongly encodes request.

Suggestions for solution

I could create a pr for a first point if this approch is acceptable with lib maintainers.

Side note rawPayload is always equal to https://github.com/dherault/serverless-offline/blob/master/src/index.js#L490

lteacher commented 5 years ago

Hey @arnas I think that you are missing some setup on AWS side. If you want to use binary content you need to do some different setup per these docs: https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-payload-encodings.html

Thats why I posted this thing from serverless-http

I knew about that Buffer.from(binaryString, 'utf-8') when i last looked and posted my response but I didnt post it here because its not relevant if the AWS setup is correct. That is because on AWS side if you are using binary content it should be coming base64 encoded and there is a flag its called isBase64Encoded. Actually I looked in the serverless-http code lol and I saw this flag so I assumed they used the flag but.... actually now I see they detect it or something so instead you have to provide some option? to set binary: true

If you look here in this gist you can see the actual property that your aws lambda-proxy binary content enabled handler will receive on the event. That is not a magical value I added for the gist, sadly that is an AWS thing that gets attached to the event.

I actually don't use serverless-http so dunno about that option. I didn't think it would be nice to include express or koa etc inside the this stuff. I have a separate package I created to make it nice for adding parsing etc, then I use the exact gist I posted above for parsing before the file handler.

In summary I think you can change your AWS setup to correctly handle binary content, then you are going to start recieving base64 encoded blobs in serverless-http. Then you need to be sure that serverless-http is expecting it to be isBase64Encoded. After that it should already work on AWS. Then to get serverless-offline working is where the actual issues are left because it doesn't do the base64 encoding magic that the AWS stuff is doing so thats all this entire issue so far.

hqnarrate commented 4 years ago

I'm running into this issue using serverless-http + serverless-offline and multer-s3 for handing multipart/form-data when uploading images.

@lteacher do you have a recommendation for getting serverless-offline to correctly handle this? My service works when deployed on AWS through api-gateway. For local development it is not.. Referring to your last comment above: "Then to get serverless-offline working is where the actual issues are left because it doesn't do the base64 encoding magic that the AWS stuff is doing so thats all this entire issue so far."

lteacher commented 4 years ago

@hqnarrate Sorry im just not up to date on this issue any more, there seems to be a PR #784 that was working to address this somewhat through the config options.

Looking back over the comments it does seem like there is a way to refactor the offending line out but I just don't have any time to investigate that at the moment and I don't use any of the packages mentioned in these issues except serverless-offline so I can't really test any resolution.

Maybe @cmuto09 can give an update on where the PR referenced above is at, it looks like it needs rebasing at minimum.

dnalborczyk commented 4 years ago

@hqnarrate @lteacher a bit of a problem is that @cmuto09 PR depends on the https://www.npmjs.com/package/serverless-apigw-binary plugin. but it seems serverless supports this now as well.

dnalborczyk commented 4 years ago

some references for serverless support:

https://serverless.com/framework/docs/providers/aws/events/apigateway/#binary-media-types https://serverless.com/blog/framework-release-v142/ https://github.com/serverless/serverless/pull/6063

additional issues: https://github.com/serverless/serverless/issues/2797 https://forum.serverless.com/t/returning-binary-data-jpg-from-lambda-via-api-gateway/796 https://github.com/dougmoscrop/serverless-http/issues/88

hqnarrate commented 4 years ago

@dnalborczyk - i was not clear. My service is working fine when deployed to AWS environment by setting the apiGateway.binaryMediaTypes correctly. It is when I'm running on offline mode with 'serverless offline start', my uploads would become corrupted.

dnalborczyk commented 4 years ago

@hqnarrate sorry, I think I haven't been clear. 😄 I knew what you meant. The links above are pointers for me (or anyone else getting to it) for the implementation.

pvsvamsi commented 4 years ago

@dnalborczyk Any update or workarounds on this issue?

abinhho commented 4 years ago

My solution: https://stackoverflow.com/a/61003498/9585130

Just add this to serverless.yml.

provider:
  apiGateway:
    binaryMediaTypes:
      - '*/*'

And I'm uing "aws-serverless-express-binary": "^1.0.1"

lassesteffen commented 4 years ago

Does anybody know a way around this in offline mode? I am trying to upload files / pdf and this does not work for me

abinhho commented 4 years ago

@pvsvamsi @lassesteffen Try aws-serverless-express-binary instead aws-serverless-express

lassesteffen commented 4 years ago

I am using apollo-server-lambda not express

andriesss commented 4 years ago

Try upgrading to the latest versions of serverless and serverless-offline. I believe this was fixed, but I could not find anything about it in the changelog

ajouve commented 4 years ago

Same issue there

@abinhho tried your solution but not working, Binary Media Types seems to be for download not upload. Not working in my case

My solution was to convert my file to base64 before upload, not really nice but only way I found to make it works

abinhho commented 4 years ago

@ajouve This upload function still running in my before project. So I sure it's worked. Maybe something wrong on your code. If you can share your code, I think someone can help.

kennyhyun commented 4 years ago

I had a similar issue with using serverless-offline + aws-serverless-express + multer

It said Error: Unexpected end of multipart data

image

whereas serverless-http has no problem.

I made a simple example, upload-to-serverless-offline And I will try to figure out anything need to be fixed.


I could confirm that aws-serverless-express-binary is working as @abinhho mentioned here

AkshayRawatt commented 3 years ago

did anyone found the solution on this issue as uploading a form-data with pdf attached, on using multer , getting buffer size increased where as when i run same on node work's properly i.e without serverless.

federicobarera commented 3 years ago

I can confirm the following setup does not work in serverless offline but works once deployed to AWS. Related to multipart uploads

"apollo-server": "^3.1.2",
"apollo-server-express": "^3.1.2",
"apollo-server-lambda": "^3.1.2",
"graphql-upload": "^12.0.0",
"serverless-offline": "^8.0.0",
provider:
  name: aws
  runtime: nodejs14.x
  apiGateway:
    binaryMediaTypes:
      - '*/*'
plugins:
  - serverless-offline
  - serverless-webpack
deniapps commented 3 years ago
"apollo-server-express": "^3.1.2",
"apollo-server-lambda": "^3.1.2",

How do you use "apollo-server-express": "^3.1.2" and "apollo-server-lambda": "^3.1.2" together?

I saw your comments on this thread: https://github.com/jaydenseric/graphql-upload/issues/155#issuecomment-899444757 I thought you used 'express' as @vaunus suggested:

const handler = apolloServer.createHandler({
  expressAppFromMiddleware(middleware) {
    const app = express()
    app.use(graphqlUploadExpress())
    app.use(middleware)
    return app
  }
})
federicobarera commented 3 years ago

Please disregard apollo-server-express . We are hosting apollo server on both self-hosted express server and as a lambda. apollo-server-lambda is built on top of apollo-server-express so lots of modules are compatible.

For the sake of this issue apollo-server-express is irrelevant. I am hosting as mentioned from vaunus in the other topic. Due to this issue with serverless-offline and file uploads we are using the self hosted version for local dev

UmerMIB commented 2 years ago

I can confirm the following setup does not work in serverless offline but works once deployed to AWS. Related to multipart uploads

"apollo-server": "^3.1.2",
"apollo-server-express": "^3.1.2",
"apollo-server-lambda": "^3.1.2",
"graphql-upload": "^12.0.0",
"serverless-offline": "^8.0.0",
provider:
  name: aws
  runtime: nodejs14.x
  apiGateway:
    binaryMediaTypes:
      - '*/*'
plugins:
  - serverless-offline
  - serverless-webpack

Thank you so much bro it works for me, I was stuck in this issue for couple of WEEKS

kirankjolly commented 2 years ago

I can confirm the following setup does not work in serverless offline but works once deployed to AWS. Related to multipart uploads

"apollo-server": "^3.1.2",
"apollo-server-express": "^3.1.2",
"apollo-server-lambda": "^3.1.2",
"graphql-upload": "^12.0.0",
"serverless-offline": "^8.0.0",
provider:
  name: aws
  runtime: nodejs14.x
  apiGateway:
    binaryMediaTypes:
      - '*/*'
plugins:
  - serverless-offline
  - serverless-webpack

Are you sure there will no way to work in serverless offline. Might be I am also having same issue https://stackoverflow.com/questions/69810751/image-file-is-not-viewing-uploaded-by-s3

mateo2181 commented 2 years ago

@UmerMIB can you share your handler to run Graphql with AWS Lambda? I'm using apollo-server-lambda and it's working but I couldn't run the mutation to save images in S3 correctly. I have that config with apiGateway in serverless.yml too.

UmerMIB commented 2 years ago

@UmerMIB can you share your handler to run Graphql with AWS Lambda? I'm using apollo-server-lambda and it's working but I couldn't run the mutation to save images in S3 correctly. I have that config with apiGateway in serverless.yml too.

import { initApi } from './bootstrap';
import config from './config';
import { graphqlUploadExpress } from 'graphql-upload';
import Express, { Application } from 'express';
const { ApolloServer } = require('apollo-server-lambda');

let server;
const createHandler = async () => {
    console.log('Serverless', 'Initializing...');
    const schema = await initApi();

    // Create Lambda compatible GraphQL Server
    console.log('Serverless', 'Creating Server...');
    server = new ApolloServer({
        schema,
        playground: {
            endpoint: `/${config.env}/graphql`,
        },
    });

    console.log('Serverless', 'server', server);

    return server.createHandler({
        expressAppFromMiddleware(middleware) {
            const app: Application = Express();
            app.use(graphqlUploadExpress());
            app.use(middleware);

            return app;
        },
        expressGetMiddlewareOptions: {
            cors: {
                origin: '*',
                credentials: false,
            },
        },
    });
};

exports.handler = async (event, context, callback) => {
    const lambda = await createHandler();
    // Warmup event will keep lambda and db connection live
    if (event.source === 'serverless-plugin-warmup') {
        console.log('---------- Warmup Event ----------')
        return {}
    }
    return lambda(event, context, () => {});
};

The schema

export const initApi = async () => {
    try {
        // Use {force: true} to reset DB
        //TODO: must be shift in deployement step. Sync with all api call
        // await sequelize.sync();
        // console.log("Sync completed.");

        // Testing connection
        await sequelize.authenticate();
        console.log('Connection has been established successfully.');
    } catch (error) {
        console.error('Unable to connect to the database:', error);
    }

    // Builds the GraphQL Schema
    if (!schema) {
        console.log("Schema doesn't exist, generating it");
        schema = await buildSchema({
            resolvers: [
                XXX
            ],
            emitSchemaFile: false,
            validate: false,
            scalarsMap: [{ type: Date, scalar: GraphQLISODateTimeOrString }],
        });
        console.log(schema);
    }

    return schema;
};

@mateo2181 is it enough code to share?

mateo2181 commented 2 years ago

Thanks so much @UmerMIB! Yes, actually my handler is quite similar to that one. I think the problem is with serverless. I'm testing with Firecamp to be able to send images, I send two variables (file and petId) but when I check the logs in CloudWatch I see: body: 'LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS1hYjY1N2Y4MzE5ODFjYjU0D......'

I have my serverless with the params that @kirankjolly comments above.

mateo2181 commented 2 years ago

Screenshot from 2021-11-17 07-59-47

This is part of the request I see in CloudWatch.

I always get a timeout. (that I set in serverless.yml in 30 sec).

alex88 commented 2 years ago

Is there a workaround on this?

federicobarera commented 2 years ago

For whomever is still messing around with this, small advice, is not worth it. You will shortly realize that lambdas have 6MB upload limits, so you better start to look into alternatives. This dude put together a good article: https://theburningmonk.com/2020/04/hit-the-6mb-lambda-payload-limit-heres-what-you-can-do/

alex88 commented 2 years ago

Why is it not worth it, my customer has to upload an excel that I have to process, so instead of having a simple function receive a few hundred kB files, I have to deal with s3, send him a presigned url or hook into s3... Also considering that's something that works just fine on AWS, it just doesn't work locally

g0dzillaa commented 1 year ago

any update on this?

This is still a valid issue.

kennyhyun commented 1 year ago

BTW, I was patching node_modules/serverless-offline/dist/events/http/HttpServer.js

for my own purpose

-      request.payload = request.payload && request.payload.toString(encoding);
+      if (request.payload && encoding !== 'binary') {
+        request.payload = request.payload.toString(encoding);
+      }
obaqueiro commented 10 months ago

@kennyhyun patch also worked for me. Thanks!

alvincrisuy commented 10 months ago

@kennyhyun this works for me as well!

C4n4rd0 commented 8 months ago

BTW, I was patching node_modules/serverless-offline/dist/events/http/HttpServer.js

for my own purpose

-      request.payload = request.payload && request.payload.toString(encoding);
+      if (request.payload && encoding !== 'binary') {
+        request.payload = request.payload.toString(encoding);
+      }

Can you create a PR for this to be merged?

C4n4rd0 commented 8 months ago

In fact, I think the best is to add a section in the serverless-offline configuration to let the user self define how the payload should be encoded based on the content type.