cds-snc / notification-planning-core

Project planning for GC Notify Core Team
0 stars 0 forks source link

Split SMS tasks into their own pods on staging #185

Open jimleroyer opened 10 months ago

jimleroyer commented 10 months ago

Description

As a Notify developer, I need SMS delivery tasks to be run by a dedicated group of pods so that I can treat them differently than email delivery tasks.

WHY are we building?

The sending rates set by AWS are different for email and sms, and hence we need to be able to tune them separately. We have decided that the best way to do this right now is to have the email and sms delivery tasks run on different groups of pods.

WHAT are we building?

VALUE created by our solution

We will be able to scale up both email and sms sending rates.

Acceptance Criteria

QA Steps

ben851 commented 10 months ago

How to split out the celery config in k8s:

  1. Create a new run_celery script with the appropriate queue config for sms only
  2. Remove sms config from old run_celery script
  3. In notifications-manifests/base:
    • Make copies of celery-deployment, celery-service, and celery-hpa naming them something like celery-deployment-sms etc.
    • Move these copies to the env/staging folder
    • Modify the "name" in metadata to reflect that they are for sms only in each new file
    • On line 127 of the new celery-deployment-sms.yaml, modify the script name to be the new sms only script
    • Add references to these files in the staging kustomization.yaml under the resources section
sastels commented 10 months ago

PR https://github.com/cds-snc/notification-manifests/pull/1980

sastels commented 9 months ago

Didn't get to merging and testing this Monday, will merge and test first thing Tuesday.

sastels commented 9 months ago

merged to staging!

sent api single / bulk with all priorities for sms and email.

Verified using this CloudWatch query) that as of 10:51 today,

sastels commented 9 months ago

Using locust sent 3300 SMS to the internal test # at 20 sms / second.

Private Zenhub Image

sastels commented 9 months ago

PR to add these charts to the SMS dashboard: https://github.com/cds-snc/notification-terraform/pull/917

sastels commented 9 months ago

not merging the dashboard PR until after we have the sms deployment on prod.

sastels commented 9 months ago

ready for QA!

sastels commented 9 months ago

Jimmy to QA

sastels commented 9 months ago

A better query to identify SMS still being sent by the old celery pods:

fields @timestamp, kubernetes.container_name, log, @logStream
| filter @message like /Start sending SMS for notification id/
| filter kubernetes.container_name != "celery-sms-send"
| sort @timestamp desc
| limit 200
sastels commented 9 months ago

So this query does find one sms sent on the old pods on Sept 20: https://staging.notification.cdssandbox.xyz/services/d6aa2c68-a2d9-4437-ab19-3ae8eb202553/notification/77630fd8-61c8-45ac-aaaa-ca5a343b8c8b This was a 2FA code sent by Notify. (for logging in to Notify) It's possible (well, likely) that there's some code that's using the old queues that I missed when I moved sms to their own queues.

sastels commented 9 months ago

ok!

sms 77630fd8-61c8-45ac-aaaa-ca5a343b8c8b sent to the notify-internal-tasks queue for delivery

So! we likely should send these sms via the new sms send pods as well. Although it's not terribly likely that we'll have a flood of people logging into Notify at the same second.

As discussed on Slack we will leave it as it is for now, and consider some sort of fix in a future version of the scaling work.

jimleroyer commented 9 months ago

Steve, Ben and I spoke offline about the SMS notification that goes through the older pod to deliver SMS. We agreed it's okay for now as this is solely meant for GCNotify 2FA. These do not represent a risk for overflooding our SMS rate limit with AWS, so having these SMS going through the scaled up nodes would be fine.

We will create a card to move the GCNotify 2FA SMS into their own dedicated queue, using the newer SMS dedicated pods.

QA test is approved, because the intended behavior of expected SMS uses the newer SMS dedicated pods. The 2FA of GCNotify is not a risk nor a blocker for scaling up.