leothomas commented 2 years ago

After working through some testing options for several days, here are some thoughts on testing strategies going forward:

Background:

The obvious tradeoff in testing is that the closer an environment replicates the production environment the less likely the tests are to let errors through, however the closer the environment replicates the production environment, the more time consuming/expensive it is to set up and run the tests.

Examples in increasing order of fidelity to production environment (and therefore in increasing quality of coverage):

1. Tests run against functions from the code base locally, with all external resources mocked.

These would be considered "proper" unit tests, as they only test functional logic "units" against various inputs and output.
There are different strategies to mock the external resources such as:
- wiremock, which first "records" the results of API calls in plain json, and then later intercepts requests made to the external API and serves the responses from the locally stored plain json, without having to contact the external API (playback)
- moto which provides a wrapper around the boto3 objects to serve mocked responses to boto3 calls
  Pros:
Tests can be run locally and quickly (and therefore can be run often)
Cons:
More work work to setup and maintain, as it requires knowledge of the behaviour of the external resources and the tests have to be updated if any of the interactions with the external resources are modified.
Test coverage is not as extensive, as these test internal logic units of the API rather than interactions between backend components. With a backend as complex at APT's, bugs most often arise in the interaction between components (eg: an S3 request might be well formulated and therefore pass the test for retrieving a file when run against a mocked S3 instance, while the actual file itself would be missing in the real API for some other reason.)

2. Tests run against a dockerized instance of the API with other resources mocked as needed, running locally (eg: localstack for cognito + s3, a postgres docker instance, an elasticsearch docker instance)

These are closer to integration tests as they allow for the testing of the integration between components (eg: does the API correctly handle the updated data model returned by the database)

Pros:

Allows for testing of integration between components locally, and therefore cheap and quickly
Allows for testing the behaviour of the API rather than the individual logic blocks. This has the advantage that team members (and especially stakeholders) who are less familiar with the codebase can still write test cases in the form of user stories, eg: As a user I want the curator to be notified when I mark my document as ready for review in order to be able to publish my ATBD quickly and efficiently.
Cons:
Because the resources are mocked on an ad-hock basis (eg: using a Postgres docker instance in a locally running container), there is no guarantee that any of the resources will behave the same way in AWS as they do locally.
Mocking the various resources requires work to configure, and sometimes requires modifications to the code base itself, in order for the code base to work when deployed in AWS and in tests (eg: the boto3 client has to be instantiated with an endpoint URL when making the requests to locally mocked resources, but no when instantiated in a lambda function)

2.1. Mock all of the AWS resources using localstack.

Localstack is a paid service that provides mocked AWS services locally. In order to avoid re-inventing the wheel, we stand a better of change of having accurately mocked services which remain up-to-date with AWS updates if we entrust it to a team paid to do just that. However the reality of it is that localstack is a product still very much under development, with several key features missing/workarounds necessary, which we will further explore later.

3. Tests run against resources deployed to an AWS environment (proper integration tests):

Pros:

The tests will most accurately reflect the behaviour of the API, since the resources that the tests get run against will be identical to the resources in production.
There is no need to invest time and effort into mocking resources' behaviour, or running them locally, as the production deployment script can be re-used to generate a testing stack.
It allows us to test our infrastructure deployment code as well. For example: an update to the infrastructure configuration that accidentally revokes access permissions from one resource to another would not be caught by testing the code logic units or by testing the API against locally mocked resources, but would be caught when run against resources deployed to AWS.
Cons:
Resources can take a very long time to deploy, on the order of 10s of minutes (and sometimes even longer to destroy). This can also be expensive both in AWS, but also in CI/CD processes, which are often paid for by the minute and would have to wait idly for the resources to finish deploying in AWS.
Account limits can be hit by deploying many instances of the same resources, which, in the best case scenario blocks further testing, but in the worst case scenario blocks other projects from deploying the resources they might need
We don't have CI/CD access to NASA MCP account, which means that there are some idiosyncratic behaviours from the production environment that we will still not be able to replicate, despite running our tests against resources deployed in AWS.

Localstack:

I decided to investigate localstack for the reasons listed in sections 2 and 2.1: localstack promises locally mocked resources that can be set up using the applications deployment code (no need to configure separately), where the interactions between services are mocked to a high degree of fidelity and frequently updated. This would also serve to run the API locally for development purposes

Current setup:

For reference, unit tests are currently written using moto mock the external AWS services that the API relies on, however, with the addition of several new services, namely Cognito, the tests have not been updated to mock cognito resources and are therefore failing. This was actually blocked for a while because moto's mocked Cognito class was missing a configuration option required for the APT backend.

In order to run the backend locally, the resources are each mocked on an ad-hock basis:

version: '3.8'
services:
  db:
    image: postgres:12.7
    ports:
      - "5432:5432"
    environment:
      POSTGRES_DB: nasadb
      POSTGRES_USER: masteruser
      POSTGRES_PASSWORD: password
    command: postgres -c log_statement=all

  bootstrapper:
    # Sets up the necessary AWS resources in Localstack and 
    # loads fixture data in the database
    build:
      context: ./
      dockerfile: ./fixture_data/bootstrapper.Dockerfile
    command: "sh fixture_data/setup.sh"
    volumes:
      - ./db:/db
      - ./db:/var/lib/postgresql/data
      - ./fixture_data/:/db/fixture_data
    environment:
      AWS_ACCESS_KEY_ID: stub
      AWS_SECRET_ACCESS_KEY: stub
      AWS_DEFAULT_REGION: us-east-1
      S3_BUCKET: nasa-apt-dev-files
      POSTGRES_DB: nasadb
      POSTGRES_USER: masteruser
      POSTGRES_PASSWORD: password
      POSTGRES_ADMIN_CREDENTIALS_ARN: mocked_credentials_arn
      USER_POOL_NAME: dev-users
      APP_CLIENT_NAME: dev-client
    depends_on:
      - db-ready
      - localstack-ready

  elastic:
    # Elastic db for local development only.
    # For staging/production, add elasticache instance
    image: docker.elastic.co/elasticsearch/elasticsearch:7.8.1
    environment:
      - discovery.type=single-node
      - http.port=9200
      - http.cors.allow-origin=*
      - http.cors.enabled=true
      - http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization
      - http.cors.allow-credentials=true
    ports:
      - "9200:9200"

  localstack:
    # localstack for local development only. AWS S3 used for staging/production
    image: localstack/localstack:0.12.13
    environment:
      LOCALSTACK_API_KEY: ${LOCALSTACK_API_KEY}
      SERVICES: s3,secretsmanager,cognito,ses
      DEBUG: 1 # Uncomment to increase localstack logging output
      AWS_ACCESS_KEY_ID: stub
      AWS_SECRET_ACCESS_KEY: stub
      AWS_DEFAULT_REGION: us-east-1
      EXTRA_CORS_ALLOWED_ORIGINS: http://localhost:9000
    ports:
      - "4566:4566"

  api:
    build:
      context: ./
      dockerfile: app/Dockerfile
    image: nasa-apt/dev/app
    command: >
      sh -c "
        python wait_for_localstack_ready.py &&
        uvicorn app.main:app --host 0.0.0.0 --port 80 --reload
      "
    ports:
      - "8000:80"
    volumes:
      - ./app:/app/app
      - ./fixture_data/wait_for_localstack_ready.py:/app/wait_for_localstack_ready.py
    environment:
      # the boto3 library needs these AWS_* env vars, even though we are using localstack.
      AWS_ACCESS_KEY_ID: stub
      AWS_SECRET_ACCESS_KEY: stub
      AWS_DEFAULT_REGION: us-east-1
      PROJECT_NAME: nasa-apt-api-local
      API_VERSION_STRING: /v2
      AWS_RESOURCES_ENDPOINT: http://localstack:4566
      S3_BUCKET: nasa-apt-dev-files
      POSTGRES_ADMIN_CREDENTIALS_ARN: mocked_credentials_arn
      # This URL omits the "http://" in order to remain consistent
      # with the value of the URL returned by the ElasticSearch Domain 
      # CDK component
      ELASTICSEARCH_URL: elastic:9200
      APT_FRONTEND_URL: $APT_FRONTEND_URL
      USER_POOL_NAME: dev-users
      APP_CLIENT_NAME: dev-client
      NOTIFICATIONS_FROM: no-reply@ds.io
    depends_on:
      - bootstrapper
...

Notice how the database and elasticsearch services (db and elasticsearch, respectively) are both just container running a postgres or an elasticsearch image, which aren't necessarily guarenteed to replicate the behaviour of those services in AWS. Similarly, the API itself is instantiated using uvicorn, instead of wrapping the FastAPI app with Mangum to create a lambda handler, which means that any misconfigurations of Mangum would not be caught when running the API locally. Additionally, the API is directly accessible at localhost:8000 meaning that none of the API Gateway configuration is represented in the local api either. The list of inconsistencies between the local and the AWS instances of the backend (and therefor the probability for uncaught errors) continues.

Proposed setup:

Pre-requisites:

CDK: npm install -g aws-cdk@v1.x (APT is currently deployed using CDK v1, but it should be soon migrated to CDK v2)
AWS CLI
Localstack AWS CLI (awslocal): wraps around the AWS CLI and ensures that all commands are executed against localstack, typically acessible at http://localhost:4566
CDK Loacal: Is to CDK what AWSLocal is to the AWS CLI

Setup:

Localstack can be easily instantiated with docker-compose:

docker-compose.yml:

localstack:
    image: localstack/localstack:latest
    environment:
      - LOCALSTACK_API_KEY=${LOCALSTACK_API_KEY}
      - DEBUG=1
      - LAMBDA_EXECUTOR=docker
      - LAMBDA_REMOTE_DOCKER=0
      - HOST_TMP_FOLDER=${TMPDIR:-/tmp}
      - LAMBDA_DOCKER_FLAGS=-e AWS_DEFAULT_REGION=us-east-1 -e AWS_RESOURCES_ENDPOINT=http://localstack:4566
      - HOST_TMP_FOLDER=${TMPDIR:-/tmp}/localstack
      - DOCKER_SOCK=unix:///var/run/docker.sock
    ports:
      - "53:53" # only required for Pro (DNS)
      - "53:53/udp" # only required for Pro (DNS)
      - "443:443" # only required for Pro (LocalStack HTTPS Edge Proxy)
      - "4510-4559:4510-4559" # external service port range
      - "4566:4566" # LocalStack Edge Proxy
    volumes:
      - "${TMPDIR:-/tmp}/localstack:/tmp/localstack"
      - "/var/run/docker.sock:/var/run/docker.sock"

And then started up using:

docker compose up --build localstack

Once localstack is up and running, the CDK stack can to be deployed to it. You should be able run:

cdklocal deploy nasa-apt-api-lambda-{STAGE}

to have an APT backend up and running, ready to be bootstrapped (run database migrations using sqitch and load test data).

Deploying the CDK stack, running the DB migrations and loading test data can be easily automated by mounting a volume that contains a bash script with the bootstrapping logic to docker-entrypoint-initaws.d:

    volumes:
        - "./startup_scripts/:/docker-entrypoint-initaws.d/startup_scripts/"

eg: statup_script/bootstrap.sh:

# install libs
npm install -g aws-cdk@v1.x aws-cdk-local 
pip install awscli-local

# app code gets mounted at /tmp
cd /tmp 
# install deployment libs
pip install ".[dev,deploy]"

# bootstrap CDK localstack
cdklocal bootstrap --require-approval never
# deploy stack to local
cdklocal deploy nasa-apt-api-lambda-staging --require-approval never

# run database migrations
cd db
./sqitch deploy db:pg://masteruser:password@localhost:4512/nasadb
cd ..

# load fixture data (if needed)
./fixture_data/bootstrap_localstack.sh

# print API ID
awslocal apigatewayv2 get-apis --query 'Items[0].ApiId'

The API endpoint should be accessible at:

http://localhost:4566/restapis/{API_ID}/local/_user_request_/v2/atbds

As you can see, the promise of localstack is to provide a local testing environment that more accurately (although not 100%) reflects the behaviour and interactions of AWS resources with much less work to configure that our current local strategy (one docker container for all the services vs. one for each).

In actuality:

In actuality, while working through this setup, I ran into a number of issues. First of all is that CDK local creates an ECR repository named aws-cdk/assets whereas the cloudformation Lambda function is defined using an image hosted under an ECR repository named assets.

Secondly, CDKLocal does not actually build and upload the lambda container image (it just simply points the Lambda function definition to a non-existent image in ECR). This means it's necessary to manually build and upload the lambda function image to the locally running ECR repository, and then update the local lambda function configuration to point to the recently uploaded docker image.

The bootstrapping logic ends up looking a bit more like:

# install libs
npm install -g aws-cdk@v1.x aws-cdk-local 
pip install awscli-local

# app code gets mounted at /tmp
cd /tmp 
# install deployment libs
pip install ".[dev,deploy]"

# bootstrap CDK localstack
cdklocal bootstrap --require-approval never

# create the `assets` repository (because CDKLocal will only create a repo 
# named `aws-cdk/assets`)
awslocal ecr create-repository --repository-name assets 

# manually build the image (with the `assets` ECR repo as the image tag)
docker build -t $(awslocal ecr describe-repositories --repository-names assets --query "repositories[0].repositoryUri" --output text):latest . -f app/Dockerfile

# log in to ECR in order to get access to upload the recently build docker image
awslocal ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin $(awslocal ecr describe-repositories --repository-names assets --query "repositories[0].repositoryUri" --output text)

# push image
docker push $(awslocal ecr describe-repositories --repository-names assets --query "repositories[0].repositoryUri" --output text):latest

# update lambda function configuration to point to the recently built + uploaded lambda container image in ECR.
awslocal lambda update-function-code --function-name $(awslocal lambda list-functions --query "Functions[0].FunctionName" --output text) --image-uri $(awslocal ecr describe-repositories --repository-names assets --query "repositories[0].repositoryUri" --output text):latest

# deploy stack to local
cdklocal deploy nasa-apt-api-lambda-staging --require-approval never

# run database migrations
cd db
./sqitch deploy db:pg://masteruser:password@localhost:4512/nasadb
cd ..

# load fixture data (if needed)
./fixture_data/bootstrap_localstack.sh

# print API ID
awslocal apigatewayv2 get-apis --query 'Items[0].ApiId'

Lastly due to localstack's multi-region support the lambda function has no "knowledge" of the region it's deployed to (even though the CDK stack configuration specifies a region), which means that the boto3 clients in the API code have to be instantiated specifying the region. This is unacceptable in my opinion, as the tests should always conform to the application code and not the other way around.

Conclusion:

While the Localstack team is very friendly and helpful, the amount of issues that are answered with "try pulling the localstack:latest image and see if the problem persists" (eg: 1, 2, 3) makes me think that localstack is still very much under active development and, frustratingly, not mature enough to reliably use to run the API locally for development and testing purposes, as promising as it seems.

Testing strategy going forward:

I think the best option would be to adopt the "classic" testing strategy: locally running unit tests to test the API code with mocked external resources, which can be run quickly and often (eg: on each commit using commit hooks) and the full integration tests run against resources deployed in AWS only when merging to develop or master.

While this requires more work to configure - (especially mocking the external resources), this allows for the adoption of a 2 layered TDD: developers define the unit tests for their proposed features and clients/partners at impact can provide behavioural tests for the integration tests. See this comment for further thoughts on TDD and involvement from partners.

The promise of TDD is that upfront cost of configuring and maintaining the tests is always less than the time saved developing new features, and the costs saved by not introducing bugs or incompatibilities.

jo-tham commented 2 years ago

Leo, I thank you for giving localstack a solid effort and writing it up!

nice

Based on your 1, 2, 2.1, and 3 some things immediately came to mind for me. Primarily, you listed as cons for 1 I see as benefits

More work to setup and maintain, as it requires knowledge of the behaviour of the external resources and the tests have to be updated if any of the interactions with the external resources are modified.

Knowledge of the behavior of the external services - we should really have this, to a great degree, anyway. Any by behavior I mean inputs, outputs, and side effects (exceptions); behind the scenes, we don't need deep knowledge of how it works. And what better way to prove we have that, and at the same time make it easier for other devs working on the project to learn the interface, than by representing its behavior in unit tests.

I think it feels like more work, especially in the early stages of a project, but over the life a project the total work and cost (time, friction) is much less. I've rarely been able to overcome the friction of unit testing early in a project, I think it's something that's beneficial as a group best practice.

Test coverage is not as extensive, as these test internal logic units of the API rather than interactions between backend components. With a backend as complex at APT's, bugs most often arise in the interaction between components (eg: an S3 request might be well formulated and therefore pass the test for retrieving a file when run against a mocked S3 instance, while the actual file itself would be missing in the real API for some other reason.)

These potential side effects, such as a ResourceNotFound exception from boto, should be mocked where that's a potential side effect in the codebase, such the application code behavior in response to interactions between backend components is fairly well tested.

Some other pros of unit tests:

facilitates more devs on a codebase
facilitates new or returning devs on a codebase
encourages devs to deal with code-smell early. usually, if a unit test is hard to write, it indicates it's time for a refactor

As for 2, 3

I see the benefit of some barebones integrations tests for 3, basically to stop a deployment if something unexpectedly breaks. This is really marginal benefit though, and I think it should be the domain of a QA team/dev, not of a backend/frontent dev.

As for 2... I'm split on localstack in general. On one hand, I'm impressed, and it's great to have a whole dev env running locally. On the other hand, the disparity it introduces between dev env and prod introduces more error surface. I would be inclined toward using actual AWS resources for dev envs, and unit tests that mock those resources. Provided that AWS resource for dev env isn't cost prohibitive.

leothomas commented 2 years ago

Thanks for all the thoughts on this subject @jo-tham! I think you've made some really good points in favor of implementing "classic" unit tests in addition to AWS based integration tests.

Provided that AWS resources for dev env isn't cost prohibitive.

I don't think that they are cost prohibitive, but I think a purely AWS based dev env is prohibitive in a couple other ways:

Time to deploy: cdk deployment can take multiple minutes (or 10s of minutes) to deploy, which can be frustrating when developing something like the LaTeX document generation, which often requires many cycles of deploy, test, verify
Limited number of certain types resources in AWS (esp. networking resources) means that multiple dev stacks can easily eat up the available quota (eg: 5 ElasticIP / account / region (by default)). Of course some of the limits can be increased, but having to request an increased limit is burdensome within the dev process.

NASA-IMPACT / nasa-apt

Localstack testing (unit/integration) #498

Background:

1. Tests run against functions from the code base locally, with all external resources mocked.

Pros:

Cons:

2. Tests run against a dockerized instance of the API with other resources mocked as needed, running locally (eg: localstack for cognito + s3, a postgres docker instance, an elasticsearch docker instance)

Pros:

Cons:

2.1. Mock all of the AWS resources using localstack.

3. Tests run against resources deployed to an AWS environment (proper integration tests):

Pros:

Cons:

Localstack:

Current setup:

Proposed setup:

Pre-requisites:

Setup:

In actuality:

Conclusion:

Testing strategy going forward: