ducktors / turborepo-remote-cache

Open source implementation of the Turborepo custom remote cache server.
https://ducktors.github.io/turborepo-remote-cache/
MIT License
1.06k stars 103 forks source link

Lambda/API Gateway integration not working #161

Open Rafael17 opened 1 year ago

Rafael17 commented 1 year ago

🐛 Bug Report

After configuring the server using Lambda/API Gateway setup, I'm running into issues where all the caches are a MISS. Similar to: https://github.com/ducktors/turborepo-remote-cache/issues/28

When there is a cache miss, the server properly adds the caches to S3. However, on a sub sequential turbo run with no changes to source code, the server responds with a 200, saying that we have a cache hit on the GET /artifacts/:id call. But then there is no POST /artifacts/events, so it thinks it was a cache miss and makes a sub sequential PUT /artifacts/:id, replacing already existing cache.

Running this locally works without problems

here is what my configs for openapi/REST API Gateway look like

paths:
  '/v8/{proxy+}':
    x-amazon-apigateway-any-method:
      parameters:
        - name: proxy
          in: path
          required: true
          schema:
            type: string
      responses: {}
      x-amazon-apigateway-integration:
        # ref: https://docs.aws.amazon.com/apigateway/api-reference/resource/integration/
        uri: --REDACTED--
        httpMethod: POST
        passthroughBehavior: when_no_match
        type: aws_proxy
        credentials: --REDACTED--
Rafael17 commented 1 year ago

Here is a little more info on what I'm seeing on the API Gateway logs request/responses in sequential order:

Known MISS after code changes:

# first run
npm run lint

1 GET - Response - 404, Artifact not found

2a - POST - Request - /v8/artifacts/events?slug=team_myteam

Method request body before transformations: [
{
    "duration": 0,
    "event": "MISS",
    "hash": "6f6bb0a759b0a7fe",
    "sessionId": "9a841c25-c0db-44df-a701-573262bc0207",
    "source": "REMOTE"
}
]

2b - POST - Response - 200 {}

3a - PUT - Request [Binary Data]

3b - PUT - Response - 200

{"urls":["team_myteam/6f6bb0a759b0a7fe"]}

Should be cache HIT but getting a MISS on the client

# 2nd run
npm run lint

1 - GET - Response - 200 (it found the cache!!!)

{
    "statusCode": 200,
    "body": "KLUv/QBYzQYAksskHXBFdbRZlsI2sREC3JiBV6+axS2ECFpk+Sr934sMxixNcY4i/ddcdDWYf83pokUMcM5hQP9XNIKoBlOQSA7JW81efTSrYzDYtdmr11qLN5VqhCJsdgmK0K2QgjPaM0U4dxOyj0oumnwwH5vZandzkTj0r7mTFkXoCeYBCxNIyQEqj5I/c4hI/tri+UB0p8AUGw0WHCAQg6otN5H5gTMOHwy7gNeTOI1NBmCygBEH7ZIkQ6QImM13DtqnEwYIDsuACtU40HZtnjpAchjY6oAIduMF1YRfBg==",
    "headers": {
        "content-type": "application/octet-stream",
        "date": "Fri, 07 Apr 2023 23:19:43 GMT",
        "connection": "keep-alive",
        "transfer-encoding": "chunked"
    },
    "isBase64Encoded": true
}

no POST request on the API Gateway

2a - PUT - Request [Binary Data]

2b - PUT - Response - 200

{"urls":["team_myteam/6f6bb0a759b0a7fe"]}

So the POST /v8/artifacts/events?slug=team_myteam request appears to be missing causing a cache MISS on the client

Rafael17 commented 1 year ago

Basically I'm looking for help to get the Lambda server working with AWS API REST Gateway, the setup in the docs works only for AWS API HTTP Gateway

Elliot-Alexander commented 1 year ago

I'm seeing a similar issue with my AWS deployment. Can see the caches in S3 and can confirm that the requests for the cache objects are returning 200s however it still causes a cache miss :( I'm seeing this issue when running in GitHub Actions.

Elliot-Alexander commented 1 year ago

As a follow up, I can provide a CDK files or Cloudformation templates if that would assist with triage?

WANZARGEN commented 1 year ago

There is a body size limit for lambda... So I used EC2 instance, and it works perfectly. 😄

Rafael17 commented 1 year ago

Thanks for the response @WANZARGEN

The limit in this case is not the issue, since I'm testing lint and the output is a tiny file. The issue has to do with the lambda server and AWS REST Gateway. Did you happen to front your EC2 with AWS REST Gateway? I'm assuming no.

Elliot-Alexander commented 1 year ago

Does using the cache from local work for you @Rafael17? As I only see this issue when running turbo in GitHub Actions. I'm wondering if it's an issue with the turbo CLI when running in CI rather than the cache server itself

Rafael17 commented 1 year ago

@Elliot-Alexander For me everything works correctly when using AWS HTTP Gateway. The problem only appears when I use AWS REST Gateway

fox1t commented 1 year ago

Can you reiterate the issue you are facing, so we can add this information to the documentation?

EloB commented 9 months ago

I think lambda only can handle files up to 6mb. https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html image

duveborg commented 8 months ago

Yeah, did anyone get around the fact that the max payload is 6MB?

EloB commented 8 months ago

@duveborg Hello mate ;) Long time no see. I think this repo needs to implement presigned urls to workaround that limitation. https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html

EloB commented 8 months ago

@duveborg I sit in the same boat as you.

My workaround right now is to deploy a small docker container using ec2 and run the server below until any lambda turbo repo implementation could handle above. https://github.com/salamaashoush/turbo-remote-cache-rs

EloB commented 8 months ago

I'm not sure if this helps anyone who want to implement this but I think that TURBO_PREFLIGHT or --preflight would be of good use to do this.

https://github.com/vercel/turbo/issues/956 https://github.com/vercel/turbo/pull/1052

EloB commented 8 months ago

I've created another repo that just solves s3 and lambda with presigned urls that allows bigger files. https://github.com/EloB/turborepo-remote-cache-lambda

git clone git@github.com:EloB/turborepo-remote-cache-lambda.git
cd turborepo-remote-cache-lambda/
npm --prefix=function install
sam build
zip -j app.zip .aws-sam/build/TurboRepoRemoteCacheFunction/*
# Create an lambda function. Manually create the lambda function as explained in this repo.

Take a look at this file. The actual lambda function is only 130 lines of code. https://github.com/EloB/turborepo-remote-cache-lambda/blob/main/function/app.ts

I haven't had time to write documentation but it's working for me.

You need to include teamId in the JWT token. Can be created at https://jwt.io/ see image. Enter some team and secret (see image below). Don't forgot to create .turbo/config.json in your Turbo repo so you enable remote caching. image

.turbo/config.json

{
  "teamid": "team_yourteamhere",
  "apiurl": "https://YOUR-LAMBDA-URL.lambda-url.eu-central-1.on.aws"
}

You also need to set environment variables JWT_SECRET (most be same as the secret from jwt.io if you created it there) and S3_BUCKET for the lambda.

Also important to use turbo run yourtask --preflight because it enables turbo to run OPTIONS request where you can change the location where to download/upload that makes s3 presigned urls to function. It's not documented yet in their openapi but I learnt it from their issues. https://turbo.build/repo/docs/core-concepts/remote-caching#self-hosting https://turbo.build/api/remote-cache-spec https://github.com/vercel/turbo/issues/956 (read more about preflight).