Create pipeline to push zip file with dependencies to an S3 bucket

constanca-m commented 3 months ago

Description

This issue comes from this comment thread of a PR to use terraform to install ESF.

The current approach for the terraform files:

Download this repository files based on the release version
Use this handler for the lambda function handler by using a docker environment and installing all dependencies

The desired approach: have all dependencies in a zip file and push this to an S3 bucket.

Steps

Step 1

Create a new buildkite pipeline in this directory.

Each version release (or commit?) triggers the creation of a new zip file with all the dependencies. This zip file needs to be pushed to an S3 bucket that will be used by customers. The S3 bucket needs to be read only.

The zip file will have the following structure:

It will have a directory for each needed package mentioned in requirements.txt. Package is generated through the use of command pip install --target ./package <REQUIREMENT>.
It will have the file with the code for the handler at the root. Because this file points to others, we will push the handlers directory as well.

Reference: https://docs.aws.amazon.com/lambda/latest/dg/python-package.html#python-package-create-dependencies.

Step 2 Refactor the terraform files:

Remove the need to download this repository.
Remove the module for lambda.
Insert a new resource aws_lambda_function that reads from the S3 bucket with the zip file:

resource "aws_lambda_function" "esf" {
// stuff

  s3_bucket = aws_s3_bucket.esf_bucket.id
  s3_key    = aws_s3_object.esf_zip_bundle.key

// stuff
}
_Originally posted by @girodav in https://github.com/elastic/terraform-elastic-esf/pull/1#discussion_r1516188875_

Tasks

Add new workflow to automate release: https://github.com/elastic/elastic-serverless-forwarder/pull/685
Trigger new workflow to push dependencies to S3 bucket in case of new release: https://github.com/elastic/elastic-serverless-forwarder/pull/689
Add new dependencies: https://github.com/elastic/elastic-serverless-forwarder/pull/692

constanca-m commented 3 months ago

Hey @girodav and @axw , can I have your thoughts on this to make sure everything is correct?

girodav commented 3 months ago

Hey Constança, thanks for opening this issue. Some comments below.

Create a new buildkite pipeline in this directory.

I don't think there is any need to create a Buildkite pipeline, since ESF does not need to be released as part of the Elastic stack. So feel free to keep using Github Actions as we already do, unless you find some benefit in moving to Buildkite.

Each version release (or commit?) triggers the creation of a new zip file with all the dependencies. This zip file needs to be pushed to an S3 bucket that will be used by customers.

We currently track releases with git tags, so the workflow could be triggered by the creation of a new git tag. We also track the version in version.py, which is currently updated manually. There is already a related issue about how to automate updates on this file and how to handle version bumps in general https://github.com/elastic/elastic-serverless-forwarder/issues/540. I'd consider it as a preliminary task for this issue.

I would also make sure that the solution is extensible enough to be able to add automated deployment to SAR as well, in a future release.

Remove the module for lambda. Insert a new resource aws_lambda_function that reads from the S3 bucket with the zip file:

This is more like an option, the current AWS Lambda Terraform module terraform-aws-modules/lambda/aws can still be used if it simplifies the implementation. It just need to be modified to use pre-built packages stored on S3

https://registry.terraform.io/modules/terraform-aws-modules/lambda/aws/latest#lambda-function-with-existing-package-prebuilt-stored-in-s3-bucket

Where should the S3 bucket be placed? Under which account? In any specific region?

You can use the same account where we store SAR artifacts.

Do all packages used in import statements in the handlers files need to be in the dependency zip?

Technically no, the AWS Lambda Python runtime already includes some of them (e.g boto3). However, we should stick to what is on requirements.txt to be sure to use the same versions everywhere. You should include only the dependencies used at runtime (i.e only requirements.txt). This part also depends on whether https://github.com/elastic/elastic-serverless-forwarder/issues/204 is going to be prioritized or not.

constanca-m commented 3 months ago

Thank you @girodav for such a detailed answer. I am working on setting a workflow on github actions like you mentioned. It seems a bit tricky to test, so I will do it in a private repository first and then I will open a PR and link it to this issue as well as to https://github.com/elastic/elastic-serverless-forwarder/issues/540.

It won't be taking of the SAR currently but it seems easy to adapt the workflow if setting the right trigger:

on:
  push:
    branches:
      - 'main'
    paths:
      - 'version.py'

elastic / elastic-serverless-forwarder