Open ben851 opened 1 year ago
Downgrading complexity of this to a 3 as I did the research and this is a super quick fix. We need to add the following annotation to all services in notification-manifests/base annotations: service.beta.kubernetes.io/aws-load-balancer-internal: "true"
We should check afterward if the issue described in this Slack thread still applies: https://gcdigital.slack.com/archives/CNWA63606/p1677015136566929
This may require 5 minutes of downtime - if we can't avoid this we will have to coordinate with app team about how best to approach this.
I've added a potential solution to avoiding downtime. For discussion tomorrow.
PR With the documentation for no downtime: https://github.com/cds-snc/notification-manifests/pull/1947
First step: cds-snc/notification-terraform#883
Had issues, will need to break the TF PR up and do in stages
first stage being tested in scratch
Deployed secondary services in production, nothing exploded. PR opened to target group to secondary. We can resume that work in the core group work session today.
Prod and staging are now running on internal load balancers. The secondary external load balancers are still up in case we need to revert quickly. We will remove them early next week.
Ben to remove secondary services today.
Secondary services removed in staging and prod, smoke tests passing in both.
Need to create a PR in notification-terraform to remove the secondary target groups.
Secondary target groups removed in staging.
@sastels Please try and hit the service urls in staging.
Confirmed that the IPs for admin, api, documentation, and dd-api now timeout when trying to connect, ex
curl internal-afd896d82c7a54e99848a8305b51ef60-568829761.ca-central-1.elb.amazonaws.com
Description
As a developer/operator of GC Notify, I would like all traffic to the system to come in through a central access point so that we are able to effectively administer and monitor the traffic, as well as protect against security threats.
WHY are we building?
The K8s services for Notify products are currently exposing themselves directly to the internet via K8s load balancers. These should be changed to internal load balancer services, so that they are not available outside of the cluster.
WHAT are we building?
Investigate how we are going to do this re: ingress controllers or internal load balancers
Verify that all functionality is intact in scratch account
Create an ADR to document the design decision
Implement solution
VALUE created by our solution
Increased security and reliability of GC Notify.
Acceptance Criteria
QA Steps