cchan9528 / ServerlessImageProcessing

0 stars 0 forks source link

[Refactor] Presigned S3 Access Should be HTTP(S), not WS #4

Open cchan9528 opened 3 years ago

cchan9528 commented 3 years ago

Context

We should redesign so that websocket is connected only when all images are uploaded and we're waiting for a result. That way this websocket is shortlived and we don't rely on it unless needed in the frontend. Here's a great example / insight that also supports this idea: https://aws.amazon.com/blogs/compute/from-poll-to-push-transform-apis-using-amazon-api-gateway-rest-apis-and-websockets/

Output

HTTP(S) routes/handlers in serverless.ts in addition to websocket for waiting for response

Misc

Don't really see need for SNS since each topic would only have 1 subscriber and would be relatively short-lived. CRUD might be extra unnecessary overhead/complications since API Gateway already supports WebSockets

cchan9528 commented 3 years ago

Use this from the AWS docs for Lambda proxy integration type for API Gateway : https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-lambda-proxy-integrations.html#api-gateway-simple-proxy-for-lambda-output-format

Basically, we need to format the response ourselves. Why? Docs say this allows for little interference from API Gateway. However, this requires us to manually (re)construct the response.

This is also what people mean when they say separate this from the actual handler code as much as possible

cchan9528 commented 3 years ago

This makes sense because "proxy" essentially means "middle person" : API Gateway is in the "middle" of client and Lambda

cchan9528 commented 3 years ago

This is an example of a utility that formats responses how Lambda needs them to be formatted: https://github.com/serverless/examples/blob/master/aws-node-rest-api-typescript/app/utils/message.ts

Also, according to https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-lambda-proxy-integrations.html#api-gateway-simple-proxy-for-lambda-output-format , only body and statusCode are needed - the other fields can be omitted:

...Otherwise, you can set [isBase64Encoded] to false or leave it unspecified. 

...

To return a response in a Lambda function in Node.js, you can use commands such as the following:

    To return a successful result, call callback(null, {"statusCode": 200, "body": "results"}).

    To throw an exception, call callback(new Error('internal server error')).

    For a client-side error (if, for example, a required parameter is missing), you can call callback(null, {"statusCode": 400, "body": "Missing parameters of ..."}) to return the error without throwing an exception.

In a Lambda async function in Node.js, the equivalent syntax would be:

    To return a successful result, call return {"statusCode": 200, "body": "results"}.

    To throw an exception, call throw new Error("internal server error").

    For a client-side error (if, for example, a required parameter is missing), you can call return {"statusCode": 400, "body": "Missing parameters of ..."} to return the error without throwing an exception.
cchan9528 commented 3 years ago

Previously used connectionId for S3 key, but i think that's only available for ws

Switched to awsRequestId which is for (HTTP(S)) lambda requests

cchan9528 commented 3 years ago

This is pretty much code complete, HOWEVER...

We should consider changing it back 🥲 . Or at least make it an option to get a presigend url on $connect too for websockets directly. Direct connect seems less roundabout? Because the S3 key doesn't match the ws connectionId so we need extra logic to resolve correct connection to update... and we don't have constraints at the moment that would make this a bad design decision/tradeoff as of now.

cchan9528 commented 3 years ago

https://www.serverless.com/framework/docs/providers/aws/events/websocket/

functions:
  connectionHandler:
    handler: handler.connectionHandler
    events:
      - websocket:
          route: $connect
      - websocket:
          route: $disconnect
  defaultHandler:
    handler: handler.defaultHandler
    events:
      - websocket: $default #simple event definition without extra route property
  customFooHandler:
    handler: handler.fooHandler
    events:
      - websocket:
          route: foo # will trigger if $request.body.action === "foo"

This (above) is a summary of how to configure the API Gateway / Lambdas

cchan9528 commented 3 years ago

Our constraint is actually the fact that no user data should be stored

Thus, we need to know the connectionId before we make the PresignedPost. Why? Because there's no simple/easy way to reassign/re-map the Lambda HTTP awsRequestId to a connectionId without a DB. We need to re-map in order to receive the event updates.

However, if we use WS from the beginning, we already create a bucket/key with the connectionId of the user. But what happens if the connection breaks intermittently...?

cchan9528 commented 3 years ago

Actually we can map it without using a DB

''' User-defined object metadata

When uploading an object, you can also assign metadata to the object. ''' https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html

If we can add the connectionId to this, then we can extract that Id in the lambda on uplod event. Then we use it to send an update to the client.

cchan9528 commented 3 years ago

We might need a database or cache in order to handle reconnect Might not be a way S3 metadata is immutable

You can set object metadata in Amazon S3 at the time you upload the object. Object metadata is a set of name-value pairs. After you upload the object, you cannot modify object metadata. The only way to modify object metadata is to make a copy of the object and set the metadata. 

Therefore we can't gracefully update the metadata without recopying the images Because at that point, might as well reupload the image...

Maybe backlog this as tech debt?

We could long poll, but then we won't be able to update the user about when the image processing steps are completed - not just when the image uploads are complete.