cds-snc / notification-planning-core

Project planning for GC Notify Core Team
0 stars 0 forks source link

Move Production DNS to Notify AWS Accounts #36

Open ben851 opened 1 year ago

ben851 commented 1 year ago

Description

As a developer/operator of Notify, I want to be able to control the DNS flow for GC Notify in my own infrastructure as code, and not have a dependency on an external team.

WHY

Currently the Production DNS is owned by CDS SRE, and we are required to submit PRs to their code base in order to modify these references. In Staging, this is done via click-ops and slack requests. It is important to codify both production and staging DNS entries in order to ensure consistency. At the same time, it is also important to remove dependencies on external teams to create high velocity development and operation cycles.

WHAT

We can open a PR with CDS SRE to have the *.notification. delegated to our own AWS accounts, at which point we can codify our DNS entries in our own terraform repositories.

VALUE

By having full control over our DNS entries, we will streamline the release and change management process. We will also be able to quickly spin up new environments and automatically create DNS entries for them under the sandbox dns zone.

Acceptance Criteria

QA Steps

jimleroyer commented 1 year ago

Hey team! Please add your planning poker estimate with Zenhub @sastels @ben851

ben851 commented 1 year ago

This is a pretty quick change, but we will need to be careful w/ Production.

ben851 commented 1 year ago

Attaching this to the BCP epic as I need to get this done in order to automate the validation of ACM certificates.

ben851 commented 1 year ago
ben851 commented 1 year ago
ben851 commented 1 year ago
ben851 commented 1 year ago
ben851 commented 1 year ago
ben851 commented 1 year ago
ben851 commented 1 year ago
ben851 commented 1 year ago

Will follow up with SRE today

ben851 commented 1 year ago
ben851 commented 1 year ago
ben851 commented 1 year ago
ben851 commented 1 year ago
ben851 commented 1 year ago
ben851 commented 1 year ago
ben851 commented 1 year ago
ben851 commented 1 year ago
ben851 commented 1 year ago
ben851 commented 1 year ago
ben851 commented 1 year ago
jimleroyer commented 1 year ago

Ticket was closed; we need to open a form with Principal Publisher. Ben is in process to submit the new form.

ben851 commented 1 year ago
jimleroyer commented 1 year ago

Need to re-raise the ticket as this was closed while Ben was on vacation. We know of next steps though and will contact proper contacts.

ben851 commented 1 year ago

Ben will do the pre-requisite work before opening the ticket again.

jimleroyer commented 1 year ago

To do a Terraform release this morning with Ben to move DNS setup to production.

jimleroyer commented 1 year ago

We encountered an issue with the permissions. Ben to resolve as the release is currently blocked.

jimleroyer commented 1 year ago

We encountered issues while running the plan to production. The new DNS provider wasn't behaving as expected and we had to make further refactoring to accommodate the differences with our different environments.

ben851 commented 1 year ago

DNS zone is now deploying to production. Needs to be tested and still need integration w/ SSC.

ben851 commented 2 months ago

We had a heck of a time getting SSC and ESDC together to deploy this.

Instead, we are looking at providing terraform with access to the notification.canada.ca route53 zone already owned by CDS.

ben851 commented 2 months ago

Opened an issue with SRE to get access to the notification.canada.ca route 53 zone https://github.com/cds-snc/dns/issues/395

ben851 commented 2 months ago

After speaking with Pat some more, I'm going to kill two birds with one stone and move the terraform plan/apply workflows to OIDC authentication.

ben851 commented 2 months ago

Migrated Terraform to OIDC yesterday. Will reach out to Pat to get the new permission scheme

ben851 commented 2 months ago

Had to revert the OIDC because it was causing problems with quicksight. Investigating

ben851 commented 2 months ago

Will be debugging today

ben851 commented 2 months ago

Refactored OIDC into the new multi-job workflows. Reproduced bug with quicksight, added an additional permission for pull-requests on the github workflow, and it's been resolved. Need 2 PRs approved:

Staging fix: https://github.com/cds-snc/notification-terraform/pull/1421

Production: https://github.com/cds-snc/notification-terraform/pull/1419

ben851 commented 2 months ago

Staging and Production running on OIDC again.

Sylvia is working on this ticket today to grant us access to the prod DNS account, at which point I will be doing diffs and imports to migrate our stuff over.

jimleroyer commented 2 months ago

Ben will work on a Terraform release as the OIDC prod changes that were done did not work. He will fix this. Afterward, we will be waiting on Sylvia to unblock us.

ben851 commented 2 months ago

OIDC fixed in prod, had to open an issue with SRE to get increased permissions on the notification-terraform-plan role in prod.

ben851 commented 2 months ago

PR for new role here: https://github.com/cds-snc/dns/pull/397

I commented that we also need at least read access for the notify-core team.

ben851 commented 1 month ago

I got access on Friday - I will work on this today.

ben851 commented 1 month ago

Did a comparison between "real"prod and the "fake" prod that DNS records are set to. There were a few discrepancies - merged in some missing entires for

doc.notification.canada.ca document.notification.canada.ca api.document.notification.canada.ca (maybe this is why doc-download-api didn't work in dev documetnation.notification.canada.ca www.notification.canada.ca

Also had a mismatch on the weighted api.notification.canada.ca which in real life points to itself instead of the api-gateway lambda endpoint. That PR will be merged today.

Once both are merged, I will re-compare between real and fake, and if all is good, change the provider in TF to point to the "real" DNS and start doing imports.

ben851 commented 1 month ago

Finished checks and did an import on prod Friday. I then made a backup of these import states and then restored the old states. Need to merge this PR to staging and then create a prod release, and restore the new states https://github.com/cds-snc/notification-terraform/pull/1447

ben851 commented 1 month ago

Merged to staging, production release ready.

https://github.com/cds-snc/notification-terraform/pull/1449

P0NDER0SA commented 1 month ago

Pond will review this PR

P0NDER0SA commented 1 month ago

Just approved this one

ben851 commented 1 month ago

Implemented in prod. Need to get SRE to remove their references to DNS in their repository

ben851 commented 1 month ago

Issue opened with SRE https://github.com/cds-snc/dns/issues/408