cds-snc / notification-planning-core

Project planning for GC Notify Core Team
0 stars 0 forks source link

Modify Kubernetes services to not use external load balancers for services #140

Open ben851 opened 1 year ago

ben851 commented 1 year ago

Description

As a developer/operator of GC Notify, I would like all traffic to the system to come in through a central access point so that we are able to effectively administer and monitor the traffic, as well as protect against security threats.

WHY are we building?

The K8s services for Notify products are currently exposing themselves directly to the internet via K8s load balancers. These should be changed to internal load balancer services, so that they are not available outside of the cluster.

WHAT are we building?

Investigate how we are going to do this re: ingress controllers or internal load balancers

Verify that all functionality is intact in scratch account

Create an ADR to document the design decision

Implement solution

VALUE created by our solution

Increased security and reliability of GC Notify.

Acceptance Criteria

QA Steps

ben851 commented 1 year ago

Downgrading complexity of this to a 3 as I did the research and this is a super quick fix. We need to add the following annotation to all services in notification-manifests/base annotations: service.beta.kubernetes.io/aws-load-balancer-internal: "true"

jimleroyer commented 11 months ago

We should check afterward if the issue described in this Slack thread still applies: https://gcdigital.slack.com/archives/CNWA63606/p1677015136566929

sastels commented 10 months ago

This may require 5 minutes of downtime - if we can't avoid this we will have to coordinate with app team about how best to approach this.

ben851 commented 10 months ago

I've added a potential solution to avoiding downtime. For discussion tomorrow.

ben851 commented 10 months ago

PR With the documentation for no downtime: https://github.com/cds-snc/notification-manifests/pull/1947

sastels commented 10 months ago

First step: cds-snc/notification-terraform#883

sastels commented 10 months ago

Had issues, will need to break the TF PR up and do in stages

sastels commented 10 months ago

first stage being tested in scratch

ben851 commented 10 months ago
jimleroyer commented 10 months ago

Deployed secondary services in production, nothing exploded. PR opened to target group to secondary. We can resume that work in the core group work session today.

ben851 commented 10 months ago

Prod and staging are now running on internal load balancers. The secondary external load balancers are still up in case we need to revert quickly. We will remove them early next week.

ben851 commented 10 months ago

Ben to remove secondary services today.

ben851 commented 9 months ago

Secondary services removed in staging and prod, smoke tests passing in both.

Need to create a PR in notification-terraform to remove the secondary target groups.

ben851 commented 9 months ago

Secondary target groups removed in staging.

@sastels Please try and hit the service urls in staging.

sastels commented 9 months ago

Confirmed that the IPs for admin, api, documentation, and dd-api now timeout when trying to connect, ex

curl internal-afd896d82c7a54e99848a8305b51ef60-568829761.ca-central-1.elb.amazonaws.com