NASA-IMPACT / nasa-apt

Code and issues relevant to the NASA APT project
Apache License 2.0
6 stars 0 forks source link

Localstack testing (unit/integration) #498

Closed leothomas closed 1 year ago

leothomas commented 2 years ago

After working through some testing options for several days, here are some thoughts on testing strategies going forward:

Background:

The obvious tradeoff in testing is that the closer an environment replicates the production environment the less likely the tests are to let errors through, however the closer the environment replicates the production environment, the more time consuming/expensive it is to set up and run the tests.

Examples in increasing order of fidelity to production environment (and therefore in increasing quality of coverage):

1. Tests run against functions from the code base locally, with all external resources mocked.

2. Tests run against a dockerized instance of the API with other resources mocked as needed, running locally (eg: localstack for cognito + s3, a postgres docker instance, an elasticsearch docker instance)

These are closer to integration tests as they allow for the testing of the integration between components (eg: does the API correctly handle the updated data model returned by the database)

Pros:

2.1. Mock all of the AWS resources using localstack.

Localstack is a paid service that provides mocked AWS services locally. In order to avoid re-inventing the wheel, we stand a better of change of having accurately mocked services which remain up-to-date with AWS updates if we entrust it to a team paid to do just that. However the reality of it is that localstack is a product still very much under development, with several key features missing/workarounds necessary, which we will further explore later.

3. Tests run against resources deployed to an AWS environment (proper integration tests):

Pros:

Localstack:

I decided to investigate localstack for the reasons listed in sections 2 and 2.1: localstack promises locally mocked resources that can be set up using the applications deployment code (no need to configure separately), where the interactions between services are mocked to a high degree of fidelity and frequently updated. This would also serve to run the API locally for development purposes

Current setup:

For reference, unit tests are currently written using moto mock the external AWS services that the API relies on, however, with the addition of several new services, namely Cognito, the tests have not been updated to mock cognito resources and are therefore failing. This was actually blocked for a while because moto's mocked Cognito class was missing a configuration option required for the APT backend.

In order to run the backend locally, the resources are each mocked on an ad-hock basis:

version: '3.8'
services:
  db:
    image: postgres:12.7
    ports:
      - "5432:5432"
    environment:
      POSTGRES_DB: nasadb
      POSTGRES_USER: masteruser
      POSTGRES_PASSWORD: password
    command: postgres -c log_statement=all

  bootstrapper:
    # Sets up the necessary AWS resources in Localstack and 
    # loads fixture data in the database
    build:
      context: ./
      dockerfile: ./fixture_data/bootstrapper.Dockerfile
    command: "sh fixture_data/setup.sh"
    volumes:
      - ./db:/db
      - ./db:/var/lib/postgresql/data
      - ./fixture_data/:/db/fixture_data
    environment:
      AWS_ACCESS_KEY_ID: stub
      AWS_SECRET_ACCESS_KEY: stub
      AWS_DEFAULT_REGION: us-east-1
      S3_BUCKET: nasa-apt-dev-files
      POSTGRES_DB: nasadb
      POSTGRES_USER: masteruser
      POSTGRES_PASSWORD: password
      POSTGRES_ADMIN_CREDENTIALS_ARN: mocked_credentials_arn
      USER_POOL_NAME: dev-users
      APP_CLIENT_NAME: dev-client
    depends_on:
      - db-ready
      - localstack-ready

  elastic:
    # Elastic db for local development only.
    # For staging/production, add elasticache instance
    image: docker.elastic.co/elasticsearch/elasticsearch:7.8.1
    environment:
      - discovery.type=single-node
      - http.port=9200
      - http.cors.allow-origin=*
      - http.cors.enabled=true
      - http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization
      - http.cors.allow-credentials=true
    ports:
      - "9200:9200"

  localstack:
    # localstack for local development only. AWS S3 used for staging/production
    image: localstack/localstack:0.12.13
    environment:
      LOCALSTACK_API_KEY: ${LOCALSTACK_API_KEY}
      SERVICES: s3,secretsmanager,cognito,ses
      DEBUG: 1 # Uncomment to increase localstack logging output
      AWS_ACCESS_KEY_ID: stub
      AWS_SECRET_ACCESS_KEY: stub
      AWS_DEFAULT_REGION: us-east-1
      EXTRA_CORS_ALLOWED_ORIGINS: http://localhost:9000
    ports:
      - "4566:4566"

  api:
    build:
      context: ./
      dockerfile: app/Dockerfile
    image: nasa-apt/dev/app
    command: >
      sh -c "
        python wait_for_localstack_ready.py &&
        uvicorn app.main:app --host 0.0.0.0 --port 80 --reload
      "
    ports:
      - "8000:80"
    volumes:
      - ./app:/app/app
      - ./fixture_data/wait_for_localstack_ready.py:/app/wait_for_localstack_ready.py
    environment:
      # the boto3 library needs these AWS_* env vars, even though we are using localstack.
      AWS_ACCESS_KEY_ID: stub
      AWS_SECRET_ACCESS_KEY: stub
      AWS_DEFAULT_REGION: us-east-1
      PROJECT_NAME: nasa-apt-api-local
      API_VERSION_STRING: /v2
      AWS_RESOURCES_ENDPOINT: http://localstack:4566
      S3_BUCKET: nasa-apt-dev-files
      POSTGRES_ADMIN_CREDENTIALS_ARN: mocked_credentials_arn
      # This URL omits the "http://" in order to remain consistent
      # with the value of the URL returned by the ElasticSearch Domain 
      # CDK component
      ELASTICSEARCH_URL: elastic:9200
      APT_FRONTEND_URL: $APT_FRONTEND_URL
      USER_POOL_NAME: dev-users
      APP_CLIENT_NAME: dev-client
      NOTIFICATIONS_FROM: no-reply@ds.io
    depends_on:
      - bootstrapper
...

Notice how the database and elasticsearch services (db and elasticsearch, respectively) are both just container running a postgres or an elasticsearch image, which aren't necessarily guarenteed to replicate the behaviour of those services in AWS. Similarly, the API itself is instantiated using uvicorn, instead of wrapping the FastAPI app with Mangum to create a lambda handler, which means that any misconfigurations of Mangum would not be caught when running the API locally. Additionally, the API is directly accessible at localhost:8000 meaning that none of the API Gateway configuration is represented in the local api either. The list of inconsistencies between the local and the AWS instances of the backend (and therefor the probability for uncaught errors) continues.

Proposed setup:

Pre-requisites:

Setup:

Localstack can be easily instantiated with docker-compose:

docker-compose.yml:

localstack:
    image: localstack/localstack:latest
    environment:
      - LOCALSTACK_API_KEY=${LOCALSTACK_API_KEY}
      - DEBUG=1
      - LAMBDA_EXECUTOR=docker
      - LAMBDA_REMOTE_DOCKER=0
      - HOST_TMP_FOLDER=${TMPDIR:-/tmp}
      - LAMBDA_DOCKER_FLAGS=-e AWS_DEFAULT_REGION=us-east-1 -e AWS_RESOURCES_ENDPOINT=http://localstack:4566
      - HOST_TMP_FOLDER=${TMPDIR:-/tmp}/localstack
      - DOCKER_SOCK=unix:///var/run/docker.sock
    ports:
      - "53:53" # only required for Pro (DNS)
      - "53:53/udp" # only required for Pro (DNS)
      - "443:443" # only required for Pro (LocalStack HTTPS Edge Proxy)
      - "4510-4559:4510-4559" # external service port range
      - "4566:4566" # LocalStack Edge Proxy
    volumes:
      - "${TMPDIR:-/tmp}/localstack:/tmp/localstack"
      - "/var/run/docker.sock:/var/run/docker.sock"

And then started up using:

docker compose up --build localstack

Once localstack is up and running, the CDK stack can to be deployed to it. You should be able run:

cdklocal deploy nasa-apt-api-lambda-{STAGE}

to have an APT backend up and running, ready to be bootstrapped (run database migrations using sqitch and load test data).

Deploying the CDK stack, running the DB migrations and loading test data can be easily automated by mounting a volume that contains a bash script with the bootstrapping logic to docker-entrypoint-initaws.d:

    volumes:
        - "./startup_scripts/:/docker-entrypoint-initaws.d/startup_scripts/"

eg: statup_script/bootstrap.sh:

# install libs
npm install -g aws-cdk@v1.x aws-cdk-local 
pip install awscli-local

# app code gets mounted at /tmp
cd /tmp 
# install deployment libs
pip install ".[dev,deploy]"

# bootstrap CDK localstack
cdklocal bootstrap --require-approval never
# deploy stack to local
cdklocal deploy nasa-apt-api-lambda-staging --require-approval never

# run database migrations
cd db
./sqitch deploy db:pg://masteruser:password@localhost:4512/nasadb
cd ..

# load fixture data (if needed)
./fixture_data/bootstrap_localstack.sh

# print API ID
awslocal apigatewayv2 get-apis --query 'Items[0].ApiId' 

The API endpoint should be accessible at:

http://localhost:4566/restapis/{API_ID}/local/_user_request_/v2/atbds

As you can see, the promise of localstack is to provide a local testing environment that more accurately (although not 100%) reflects the behaviour and interactions of AWS resources with much less work to configure that our current local strategy (one docker container for all the services vs. one for each).

In actuality:

In actuality, while working through this setup, I ran into a number of issues. First of all is that CDK local creates an ECR repository named aws-cdk/assets whereas the cloudformation Lambda function is defined using an image hosted under an ECR repository named assets.

Secondly, CDKLocal does not actually build and upload the lambda container image (it just simply points the Lambda function definition to a non-existent image in ECR). This means it's necessary to manually build and upload the lambda function image to the locally running ECR repository, and then update the local lambda function configuration to point to the recently uploaded docker image.

The bootstrapping logic ends up looking a bit more like:

# install libs
npm install -g aws-cdk@v1.x aws-cdk-local 
pip install awscli-local

# app code gets mounted at /tmp
cd /tmp 
# install deployment libs
pip install ".[dev,deploy]"

# bootstrap CDK localstack
cdklocal bootstrap --require-approval never

# create the `assets` repository (because CDKLocal will only create a repo 
# named `aws-cdk/assets`)
awslocal ecr create-repository --repository-name assets 

# manually build the image (with the `assets` ECR repo as the image tag)
docker build -t $(awslocal ecr describe-repositories --repository-names assets --query "repositories[0].repositoryUri" --output text):latest . -f app/Dockerfile

# log in to ECR in order to get access to upload the recently build docker image
awslocal ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin $(awslocal ecr describe-repositories --repository-names assets --query "repositories[0].repositoryUri" --output text)

# push image
docker push $(awslocal ecr describe-repositories --repository-names assets --query "repositories[0].repositoryUri" --output text):latest

# update lambda function configuration to point to the recently built + uploaded lambda container image in ECR.
awslocal lambda update-function-code --function-name $(awslocal lambda list-functions --query "Functions[0].FunctionName" --output text) --image-uri $(awslocal ecr describe-repositories --repository-names assets --query "repositories[0].repositoryUri" --output text):latest

# deploy stack to local
cdklocal deploy nasa-apt-api-lambda-staging --require-approval never

# run database migrations
cd db
./sqitch deploy db:pg://masteruser:password@localhost:4512/nasadb
cd ..

# load fixture data (if needed)
./fixture_data/bootstrap_localstack.sh

# print API ID
awslocal apigatewayv2 get-apis --query 'Items[0].ApiId' 

Lastly due to localstack's multi-region support the lambda function has no "knowledge" of the region it's deployed to (even though the CDK stack configuration specifies a region), which means that the boto3 clients in the API code have to be instantiated specifying the region. This is unacceptable in my opinion, as the tests should always conform to the application code and not the other way around.

Conclusion:

While the Localstack team is very friendly and helpful, the amount of issues that are answered with "try pulling the localstack:latest image and see if the problem persists" (eg: 1, 2, 3) makes me think that localstack is still very much under active development and, frustratingly, not mature enough to reliably use to run the API locally for development and testing purposes, as promising as it seems.

Testing strategy going forward:

I think the best option would be to adopt the "classic" testing strategy: locally running unit tests to test the API code with mocked external resources, which can be run quickly and often (eg: on each commit using commit hooks) and the full integration tests run against resources deployed in AWS only when merging to develop or master.

While this requires more work to configure - (especially mocking the external resources), this allows for the adoption of a 2 layered TDD: developers define the unit tests for their proposed features and clients/partners at impact can provide behavioural tests for the integration tests. See this comment for further thoughts on TDD and involvement from partners.

The promise of TDD is that upfront cost of configuring and maintaining the tests is always less than the time saved developing new features, and the costs saved by not introducing bugs or incompatibilities.

jo-tham commented 2 years ago

Leo, I thank you for giving localstack a solid effort and writing it up!

nice

Based on your 1, 2, 2.1, and 3 some things immediately came to mind for me. Primarily, you listed as cons for 1 I see as benefits

  • More work to setup and maintain, as it requires knowledge of the behaviour of the external resources and the tests have to be updated if any of the interactions with the external resources are modified.

Knowledge of the behavior of the external services - we should really have this, to a great degree, anyway. Any by behavior I mean inputs, outputs, and side effects (exceptions); behind the scenes, we don't need deep knowledge of how it works. And what better way to prove we have that, and at the same time make it easier for other devs working on the project to learn the interface, than by representing its behavior in unit tests.

I think it feels like more work, especially in the early stages of a project, but over the life a project the total work and cost (time, friction) is much less. I've rarely been able to overcome the friction of unit testing early in a project, I think it's something that's beneficial as a group best practice.

  • Test coverage is not as extensive, as these test internal logic units of the API rather than interactions between backend components. With a backend as complex at APT's, bugs most often arise in the interaction between components (eg: an S3 request might be well formulated and therefore pass the test for retrieving a file when run against a mocked S3 instance, while the actual file itself would be missing in the real API for some other reason.)

These potential side effects, such as a ResourceNotFound exception from boto, should be mocked where that's a potential side effect in the codebase, such the application code behavior in response to interactions between backend components is fairly well tested.

Some other pros of unit tests:

As for 2, 3

I see the benefit of some barebones integrations tests for 3, basically to stop a deployment if something unexpectedly breaks. This is really marginal benefit though, and I think it should be the domain of a QA team/dev, not of a backend/frontent dev.

As for 2... I'm split on localstack in general. On one hand, I'm impressed, and it's great to have a whole dev env running locally. On the other hand, the disparity it introduces between dev env and prod introduces more error surface. I would be inclined toward using actual AWS resources for dev envs, and unit tests that mock those resources. Provided that AWS resource for dev env isn't cost prohibitive.

leothomas commented 2 years ago

Thanks for all the thoughts on this subject @jo-tham! I think you've made some really good points in favor of implementing "classic" unit tests in addition to AWS based integration tests.

Provided that AWS resources for dev env isn't cost prohibitive.

I don't think that they are cost prohibitive, but I think a purely AWS based dev env is prohibitive in a couple other ways: