Research options to deploy the API and data-pusher containers to AWS

bbenligiray commented 11 months ago

Both containers are deployed with a config file (and possibly a secrets.env file) like https://docs.api3.org/reference/airnode/latest/docker/client-image.html

Requirements:

Noob-friendly, ideally through a GUI
Minimizes the risk of the user messing things up (using the wrong config file, breaking the deployment by deleting S3 content, exposing the secrets file, etc.)
Doesn't require the user to have anything installed locally other than Docker (Edit: though ideally not even that because we don't want to maintain a deployer container)

stefiix92 commented 11 months ago

Terraform x Cloud Formation x CDK

Terraform

uses cloud api under the hood
common syntax across multiple clouds
can save state file
a lot of community modules already developed and they are bug free
hcl language (hashicorp specific)

cons

terraform installed or docker image which contains terraform binary
requires obtaining credentials to interact with the cloud

Cloud Formation

aws specific (e.g. in azure ARM is used)
yaml syntax
can deploy resources via aws web
user can use click-n-go from aws marketplace
saves cloud formation stack, resources can be deleted

cons

unreadable and hard to manage once the code gets bigger

CDK

cloud specific
in aws uses cloud formation under the hood
typescript language
stores state same way as cloud formation

cons

cdk installed or docker image which contains cdk binary

stefiix92 commented 11 months ago

For on-premise infrastructure, we can write a simple docker-compose file which will read the .env file and configure the docker image based on it.

andreogle commented 11 months ago

There are two parts to consider: our own deployment and external user deployments. For our deployment, I think the consensus was that we can be AWS specific (and maybe setup a fallback on some other managed/unmanaged non-AWS Docker service). External users should be able to run anywhere - we provide the public Docker image, they provide the config files.

Providing the config files then raises some questions about how we want to do that. We could use Docker volumes, but that seems to limit us if we go with AWS ECS (Fargate). Fargate (to me) seems preferable since it's limited infrastructure maintenance on our end. From what I've seen, if we want volumes with Fargate, we would need to setup an EFS volume, which seems to require an EC2 instance.

We also need to decide if we want build time configuration and runtime configuration. Build time only needs the file available at build time, but requires extra steps on the user's part to build their own images. Runtime needs the file to be always available, but is more flexible.

My Proposal

We go with runtime configuration files by adding a CONFIG_SOURCE env variable. When the app boots, it gets the relevant config file using the source. Options are initially:

local (default) - which loads from the filesystem and requires mounting the config file
aws-s3 - loads from S3 (also requires some AWS_ env variables)
[more can be added]

This provides us and external users with flexible options for deciding how they want to deploy. It's then critical to ensure our config files are always available. We could even expand CONFIG_SOURCE to host in multiple places for redundancy if we wanted. e.g. CONFIG_SOURCES

More detail in my thread here: https://api3workspace.slack.com/archives/C05S589E7B4/p1695730031236799

Then in terms of our own infrastructure setup, I don't have really have a preference between Terraform or Cloudformation. I think we previously settled on Cloudformation simply because we already decided on AWS. I'm not familiar with CDK enough to really comment here

To the requirements:

Noob-friendly, ideally through a GUI

✅ Everything should be doable through a GUI

Minimizes the risk of the user messing things up (using the wrong config file, breaking the deployment by deleting S3 content, exposing the secrets file, etc.)

❌ The containers break if the config files are not available. The only way around this I can see is to bake the config in at build time. Maybe worth noting that running containers will only break on restart.

❓ The user could accidentally configure a hosted config file to be public on S3 but I think the risk of users misconfiguring something is always present to some extent.

Doesn't require the user to have anything installed locally other than Docker (Edit: though ideally not even that because we don't want to maintain a deployer container)

✅ They don't need to install anything (maybe not even Docker)

Sorry for the long post. Please let me know if there is any other option I've missed or got wrong. I'm not an infrastructure guy

stefiix92 commented 11 months ago

@andreogle

we would need to setup an EFS volume, which seems to require an EC2 instance.

EBS requires EC2 instance. If we use EFS, then the FS will be mounted and the config file can be downloaded to a mounted drive from CONFIG_SOURCE.

Everything should be doable through a GUI

Terraform plan/apply requires CLI. You can do it through some SaaS, but it's out of scope of this task IMHO

andreogle commented 11 months ago

EBS requires EC2 instance. If we use EFS, then the FS will be mounted and the config file can be downloaded to a mounted drive from CONFIG_SOURCE.

Doesn't EFS also require setting up an EC2 instance?

stefiix92 commented 11 months ago

just for the file interactions.

stefiix92 commented 11 months ago

My outcome:

Considering all the options I'd go with the Terraform public module approach. It will be developed by us, primarily used by us, and in case somebody else wants to use it, they can reference the module and fill up the correct values. Terraform can be either installed on a computer or run through a docker image. We already use Terraform Cloud (TFC), so we will stick with it for internal deployments.

In terraform we can support AWS and potentially self-hosted docker.

bbenligiray commented 11 months ago

Requires https://github.com/api3dao/signed-api/issues/74

metobom commented 10 months ago

I prepared a prototype procedure for the pusher deployments using AWS CloudFormation. Here is the thing,

CloudFormation TaskDefinition has a field named Environment, we will pass whole content of the secrets.env to there. Contents of the Environment field are passed as an environment variable to the container.
```
"Environment": [
{
    "Name": "SECRETS_ENV",
    "Value": "WALLET_MNEMONIC=<YOUR_MNEMONIC>\\nCRYPTOCOMPARE_API_KEY=<YOUR_CRYPTOCOMPARE_API_KEY>"
}
],
```
CloudFormation also has an option to override the entrypoint of the container, overrided entrypoint will look like the below:

echo -e $SECRETS_ENV >> ./config/secrets.env # write contents of the secrets.env that is passed in the Environment field of the CloudFormation template to the required path.
wget -O - https://config-url.com/pusher.json >> ./config/pusher.json # Write contents of the publicly available config to the required path
node --enable-source-maps dist/index.js # run the pusher

Here is the CloudFormation template:

{
    "AWSTemplateFormatVersion": "2010-09-09",
    "Description": "A CloudFormation template for deploying  my_app.",
    "Resources": {
        "CloudWatchLogsGroup": {
            "Description": "The service to log the outputs of the app.",
            "Type": "AWS::Logs::LogGroup",
            "Properties": {
                "LogGroupName": "myAppLogs",
                "RetentionInDays": 7
            }
        },
        "MyAppDefinition": {
            "Type": "AWS::ECS::TaskDefinition",
            "Description": "App's definiton",
            "Properties": {
                "NetworkMode": "awsvpc",
                "Cpu": 256,
                "Memory": 512,
                "ExecutionRoleArn": {
                    "Ref": "ECSTaskRole"
                },
                "RequiresCompatibilities": ["FARGATE"], 
                "ContainerDefinitions": [
                    {
                        "Name": "my_app",
                        "Image": "api3/pusher:0.1.0-rc2",
                        "Environment": [
                            {
                                "Name": "SECRETS_ENV",
                                "Value": "WALLET_MNEMONIC=<YOUR_MNEMONIC>\\nCRYPTOCOMPARE_API_KEY=<YOUR_CRYPTOCOMPARE_API_KEY>"
                            }
                        ],
                        "EntryPoint": ["/bin/sh", "-c", "echo -e $SECRETS_ENV >> ./config/secrets.env && wget -O - <YOUR_CONFIG_URL> >> ./config/pusher.json && node --enable-source-maps dist/index.js"],
                        "LogConfiguration": {
                            "LogDriver": "awslogs",
                            "Options": {
                                "awslogs-group": {
                                    "Ref": "CloudWatchLogsGroup"
                                },
                                "awslogs-region": {
                                    "Ref": "AWS::Region"
                                },
                                "awslogs-stream-prefix": "my_app"
                            }
                        }
                    }
                ]
            }
        },
        "MyAppCluster": {
            "Type" : "AWS::ECS::Cluster",
            "Description": "ECS Cluster to run services.",
            "Properties" : {
                "ClusterName" : "my_app_cluster"
            }
        },
        "MyAppService": {
            "Type": "AWS::ECS::Service",
            "Description": "Service to run the defined app.",
            "Properties": {
                "Cluster": {
                    "Ref": "MyAppCluster"
                },
                "ServiceName": "my_app_service",
                "DesiredCount": 1,
                "LaunchType": "FARGATE",
                "TaskDefinition": {
                    "Ref": "MyAppDefinition"
                },
                "NetworkConfiguration": {
                    "AwsvpcConfiguration": {
                        "AssignPublicIp": "ENABLED",
                        "Subnets": [
                            { "Ref": "MySubnet" }
                        ]
                    }
                },
                "DeploymentConfiguration": {
                    "MinimumHealthyPercent": 100,
                    "MaximumPercent": 200
                }
            }
        },

        "MyVPC": {
            "Type": "AWS::EC2::VPC",
            "Properties": {
              "CidrBlock": "10.0.0.0/16",
              "EnableDnsSupport": true,
              "EnableDnsHostnames": true
            }
        },
        "MyInternetGateway": {
            "Type": "AWS::EC2::InternetGateway"
        },
        "MyVPCGatewayAttachment": {
            "Type": "AWS::EC2::VPCGatewayAttachment",
            "Properties": {
                "VpcId": { "Ref": "MyVPC" },
                "InternetGatewayId": { "Ref": "MyInternetGateway" }
            }
        },
        "MyPublicRouteTable": {
            "Type": "AWS::EC2::RouteTable",
            "Properties": {
                "VpcId": { "Ref": "MyVPC" }
            }
        },
        "MyPublicRoute": {
            "Type": "AWS::EC2::Route",
            "DependsOn": "MyVPCGatewayAttachment",
            "Properties": {
                "RouteTableId": { "Ref": "MyPublicRouteTable" },
                "DestinationCidrBlock": "0.0.0.0/0",
                "GatewayId": { "Ref": "MyInternetGateway" }
            }
        },
        "MySubnet": {
            "Type": "AWS::EC2::Subnet",
            "Properties": {
                "CidrBlock": "10.0.0.0/24",
                "VpcId": { "Ref": "MyVPC" },
                "MapPublicIpOnLaunch": true
            }
        },
        "MyPublicSubnet1RouteTableAssociation": {
            "Type": "AWS::EC2::SubnetRouteTableAssociation",
            "Properties": {
                "RouteTableId": { "Ref": "MyPublicRouteTable" },
                "SubnetId": { "Ref": "MySubnet" }
            }
        },

        "ECSTaskRole": {
            "Type": "AWS::IAM::Role",
            "Description": "Role for running ECS tasks and creating logs.",
            "Properties": {
                "AssumeRolePolicyDocument": {
                    "Statement": {
                        "Effect": "Allow",
                        "Principal": {
                            "Service": [
                                "ecs-tasks.amazonaws.com"
                            ]
                        },
                        "Action": [
                            "sts:AssumeRole"
                        ]
                    }
                },
                "Policies": [
                    {
                        "PolicyName": "MyAppAmazonECSTaskExecutionRolePolicy",
                        "PolicyDocument": {
                            "Statement": {
                                "Effect": "Allow",
                                "Action": [
                                    "logs:CreateLogGroup",
                                    "logs:CreateLogStream",
                                    "logs:DescribeLogStreams",
                                    "logs:PutLogEvents"
                                ],
                                "Resource": "*"
                            }
                        }
                    }
                ]
            }
        }
    }
}

Please review each field under the Resources, especially network-related ones.

Test Deployment

Steps to deploy a pusher and push signed data to https://pool.nodary.io:

Fill <YOUR_CONFIG_URL> in the CloudFormation template. >> Config URL
Fill <YOUR_CRYPTOCOMPARE_API_KEY> in the CloudFormation template. It is a free API key, you can get one by signing up.
Fill <YOUR_MNEMONIC> in the CloudFormation template.
Go to CloudFormation section in the AWS dashboard, click on Create Stack, upload your template in Step 1
Give a name to your stack in Step 2.
Don't change anything in Step 3.
Tick the checkbox with the text I acknowledge that AWS CloudFormation might create IAM resources in Step 4 and submit.
Wait for AWS to deploy everything and check CloudWatch log group named myAppLogs to see what's up. After 1 or 2 minutes, you should see your signed data in https://pool.nodary.io/ after a successful deployment.

Frontend for the deploymnets

The plan is to create a repo similar to the old operations repo that is used by a frontend. @hiletmis currently working on it. In the frontend, API providers will be able to see and compare their deployments in the frontend. They will select their deployment (created by us), fill in the secrets in the frontend, and receive the populated CloudFormation template. Then they will just go to the AWS CloudFormation dashboard, upload their template, and deploy.

BTT21000 commented 10 months ago

The CloudFormation template should eventually be located in this repo: https://github.com/api3dao/api-integrations

Siegrift commented 10 months ago

Thanks for the deployment instructions. So far I tested the happy path and it works. I have to say I quite like the flow. I expected I would have to click way more stuff on AWS, but it was quite simple (when you just skip step 3). I quite like the options to update/remove the stack. I expect AWS handles removing the old stack fine...

Few notes I did along the way:

Deployment worked but I made a mistake and needed to redeploy
During redeployment it can rollback the resources, but I don't need to tick the "it can create other resources" checkbox anymore
- I still checked it just to be sure
There is a button to delete the stack so that is also nice
I added LOG_LEVEL=debug and encountered https://github.com/api3dao/signed-api/issues/90

I want to talk with @stefiix92 about the all the resources, but he is ill now so probably next week.

stefiix92 commented 10 months ago

Environments vs Secrets

Environment block isn’t encrypted and anyone with the read access of the ECS cluster is allowed to read the secrets values (mnemonic, api keys, …)

Instead of it, it’s recommended to use Secrets block where the environment variable is injected in the container from the SecretsManager or Parameter Store. First you have to create new secret and then just put the reference in the Task Definition as ARN value. This way the users who have read access to ECS cluster can’t read the secret value directly.

Also, I’d rather create a script which will do the write in the ./config/secrets.env rather than storing it all in 1 env variable.

Networking

In the template I miss the Load Balancer resource. ECS will assign a public IP to the pusher container (via "AssignPublicIp": "ENABLED”) but this will change with every new container (e.g. on restarts)

After a call with Emo I see there is no need for Load Balancer. However, we found out that currently, the stack uses IPv4, which will be paid since februrary 2024. To overcome this, we should enable ipv6 only addresses and lower the costs. https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-ipv6-only-subnets-and-ec2-instances/

Application upgrades

I’m unsure how CF will handle changes in the MyAppDefinition Task Definition resource. Probably it will create a new version and update MyAppService directly to it. It would be better if CF could keep the old task definitions allowing users to rollbacks in case something got broken.

Infrastructure upgrades

It would be good to test the infra updates/rollbacks

Other minor things I understand it wasn't covered in PoC

open policies (should be more restrictive)
missing task role

@Siegrift if you want we can discuss the topics more.

stefiix92 commented 10 months ago

Logging Options

I've looked at the logging possibilities for ECS. Generally, you can forward the logs to the AWS Log Group and format them based on the logger type. However, some advanced logging configurations might be useful for us.

Use firelens logdriver to use fluentbit (link)
Configure the fluentbit logger with the advanced properties and stream the logs to the desired destination (link)

e.g. we can use Loki which is trendy logging solution developed by Grafana.

However, this setup has also 1 drawback. As the logs are directly streamed to the fluentbit target, a user can't watch the logs directly from ECS Task. If this should be the requirement too, then we can develop a Lambda function (deployed with CF along with ECS service) which will basically forward the logs from AWS LogGroup to the desired destination (e.g. Loki)

bdrhn9 commented 10 months ago

Logging Options

I've looked at the logging possibilities for ECS. Generally, you can forward the logs to the AWS Log Group and format them based on the logger type. However, some advanced logging configurations might be useful for us.
1. Use firelens logdriver to use fluentbit ([link](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/firelens-using-fluentbit.html))

2. Configure the fluentbit logger with the advanced properties and stream the logs to the desired destination ([link](https://medium.com/@kvendingoldo/aws-ecs-and-loki-integration-aae324456d7f))
e.g. we can use Loki which is trendy logging solution developed by Grafana.

However, this setup has also 1 drawback. As the logs are directly streamed to the fluentbit target, a user can't watch the logs directly from ECS Task. If this should be the requirement too, then we can develop a Lambda function (deployed with CF along with ECS service) which will basically forward the logs from AWS LogGroup to the desired destination (e.g. Loki)

Yeah thanks, there was issue for it and we chose the Loki as centralized logging instance and fluentbit as its broker. You can see details https://github.com/api3dao/tasks/issues/257.

By adding extra [OUTPUT] section like following, you can watch logs over CloudWatch which is created for fluentbit sidecar container. Because logging options (on CloudFormation, like TaskDefinition) only allows for one [OUTPUT] for fluentbit container, either you need to rebuild fluentbit image for this purpose or store fluentbit config somewhere then start client with the command -c /somewhere/fluent-bit.conf

[OUTPUT]
    Name   stdout
    Match  *

Siegrift commented 10 months ago

We agreed to close the issue as the research phase is completed. The feedback by @stefiix92 will be implemented in separate issues yet to be created by @metobom .

api3dao / signed-api