azavea / pfb-network-connectivity

PFB Bicycle Network Connectivity
Other
40 stars 11 forks source link

Update RDS SSL/TLS certs #956

Closed JN-Hernandez closed 3 months ago

JN-Hernandez commented 3 months ago

Overview

An email notification was received regarding expiring RDS certs:

You are receiving this message because your AWS Account has one or more Amazon RDS, or Amazon Aurora database instances in the US-EAST-1 Region using a SSL/TLS Certificate that is expiring on August 22, 2024.

As such, we will need to update the RDS SSL/TLS certs prior to Aug 22, 2024.

Is your feature request related to a problem? Please describe.

Preliminary investigation shows that both the staging and production environments are using certs (rds-ca-2019) that will expire at the end of the month:

$ aws rds describe-db-instances --region us-east-1 | grep DBInstanceIdentifier
            "DBInstanceIdentifier": "dbproduction",
            "ReadReplicaDBInstanceIdentifiers": [],
            "DBInstanceIdentifier": "dbstaging",
            "ReadReplicaDBInstanceIdentifiers": [],

$ aws rds describe-db-instances --db-instance-identifier dbstaging | grep CACertificateIdentifier
            "CACertificateIdentifier": "rds-ca-2019",
$ aws rds describe-db-instances --db-instance-identifier dbproduction | grep CACertificateIdentifier
            "CACertificateIdentifier": "rds-ca-2019",

Describe the solution you'd like

We need to update the cert to rds-ca-rsa2048-g1, which will not expire for 40 years.

Additional Context

JN-Hernandez commented 3 months ago

WorkPlan

Summary

An email notification was received regarding expiring RDS certs:

You are receiving this message because your AWS Account has one or more Amazon RDS, or Amazon Aurora database instances in the US-EAST-1 Region using a SSL/TLS Certificate that is expiring on August 22, 2024.

As such, we will need to update the RDS SSL/TLS certs prior to Aug 22, 2024.

Preliminary Investigation

Command to determine applications actively connected using SSL:

SELECT datname, usename, ssl, client_addr 
  FROM pg_stat_ssl INNER JOIN pg_stat_activity ON pg_stat_ssl.pid = pg_stat_activity.pid
  WHERE ssl is true and usename<>'rdsadmin';
\l+

Steps

Pre-Implementation

Implementation

aws rds describe-db-instances --db-instance-identifier <db_identifier> | grep CACertificateIdentifier
aws rds modify-db-instance \
          --db-instance-identifier <db_identifier> \
          --ca-certificate-identifier rds-ca-rsa2048-g1 \
          --apply-immediately
aws rds describe-db-instances --db-instance-identifier <db_identifier> | grep DBInstanceStatus

Post-Implementation

aws rds modify-certificates \
          --certificate-identifier rds-ca-rsa2048-g1 \
          --region us-east-1

Criteria for Success

Connect to the bastion host

# Connect to the VPN
# Download the appropriate PEM file from 1Password
chmod 400 <pem_file>
ssh -i <path/to/file.pem> ec2-user@<bastion_IP>

Connect to the database from the bastion host

[!Note] The PSQL connection string for the database is within the Notes section of the 1Password entry. You will be prompted for the password.

SELECT datname, usename, ssl, client_addr 
  FROM pg_stat_ssl INNER JOIN pg_stat_activity ON pg_stat_ssl.pid = pg_stat_activity.pid
  WHERE usename<>'rdsadmin';

Risk

High Risk - this change will require database downtime and any connections that use SSL will need to be updated.

  1. Any connections to the database that use SSL and are not updated will no longer be able to connect.
  2. Failure to implement this change will result in certificate expiration on Aug 22, 2024, resulting in SSL connection failures.

Rollback

aws rds modify-db-instance \
          --db-instance-identifier <db_identifier> \
          --ca-certificate-identifier rds-ca-2019 \
          --apply-immediately

Additional Details

JN-Hernandez commented 3 months ago

Paired with @KlaasH to implement this change. We found that we were (1) unable to run the infra script and (2) there were connections post-implmentation not using SSL connections. We opted to go ahead and manually add the external IP to the sgBastion security groups for the duration of this work and backed it out post-implementation. Please see the following sections for additional details:

Unable to run the Infra Script

In attempting to run the infra script, we encountered the following error:

$ ./scripts/infra plan

Attempting to deploy application version [bf8025e]...
-----------------------------------------------------

~/projects/pfb-network-connectivity/deployment/terraform ~/projects/pfb-network-connectivity
download: s3://staging-pfb-config-us-east-1/terraform/terraform.tfvars to ./staging-pfb-config-us-east-1.tfvars
Updating AWS Batch job definitions
~/projects/pfb-network-connectivity/deployment/aws-batch ~/projects/pfb-network-connectivity/deployment/terraform ~/projects/pfb-network-connectivity
[+] Building 2.1s (6/10)
 => [internal] load build definition from Dockerfile
[...]
=> => transferring context: 512B                                                                                                      0.0s
 => ERROR [2/6] RUN apt-get update && apt-get install -y --no-install-recommends     python3-pip     build-essential     && rm -rf /v  1.0s
------
 > [2/6] RUN apt-get update && apt-get install -y --no-install-recommends     python3-pip     build-essential     && rm -rf /var/lib/apt/lists/*:
#0 0.534 Get:1 http://deb.debian.org/debian bookworm InRelease [151 kB]
#0 0.789 Err:1 http://deb.debian.org/debian bookworm InRelease
#0 0.789   At least one invalid signature was encountered.
[...]
#0 0.975 E: The repository 'http://deb.debian.org/debian bookworm-updates InRelease' is not signed.
#0 0.975 W: GPG error: http://deb.debian.org/debian-security bookworm-security InRelease: At least one invalid signature was encountered.
#0 0.975 E: The repository 'http://deb.debian.org/debian-security bookworm-security InRelease' is not signed.
------
failed to solve: executor failed running [/bin/sh -c apt-get update && apt-get install -y --no-install-recommends     python3-pip     build-essential     && rm -rf /var/lib/apt/lists/*]: exit code: 100

Further investigation is needed - current suspicions are that the base image for Debian Bookwork is too old and has transitioned to an archived state.

Connections coming through not using SSL

After implementation, we see there are two IPs that are making connections not using SSL, 10.0.3.9 and 10.0.1.202:

pfbproductionmain=> SELECT datname, usename, ssl, client_addr, query
  FROM pg_stat_ssl INNER JOIN pg_stat_activity ON pg_stat_ssl.pid = pg_stat_activity.pid
  WHERE usename<>'rdsadmin';
      datname      | usename | ssl | client_addr | query
---------------------------------------------------------
 pfbproductionmain | pfb     | f   | 10.0.3.9    | SELECT ST_AsBinary("geom") AS geom,"ft_seg_str" FROM (SELECT * FROM pfb_analysis_neighbor
hoodwaysresults WHERE "job_id" = '93178546-2af2-4a58-8039-10cc91e674a7') as m WHERE "geom" && ST_MakeEnvelope(-87.63828277587891,41.87544075
639651,-87.63622283935547,41.8769745620659,4326)
[...]
 pfbproductionmain | pfb     | f   | 10.0.1.202  | SELECT ST_AsBinary("geom") AS geom,"ft_seg_str" FROM (SELECT * FROM pfb_analysis_neighbor
hoodwaysresults WHERE "job_id" = '93178546-2af2-4a58-8039-10cc91e674a7') as m WHERE "geom" && ST_MakeEnvelope(-87.63690948486328,41.87748582
244232,-87.63484954833984,41.87901957903126,4326)

We suspect these are from the Lambda function created by the Tiler. The question we have is "were these connections ever using SSL?" rather than "Did this implementation break SSL connections?"