Open jimleroyer opened 1 month ago
with the current limit in staging of 20 celery-sms-send pods we can get up to 1250 SMS / min sends to the internal test number 6135550123. Note that these sends stop BEFORE making the boto call to pinpoint.
Testing pushing up SMS sending on dev: https://docs.google.com/spreadsheets/d/1pQ9SFQZF9wFzX7I0z_Cjxt2S2URqpzedIHryO82Xl6g/edit?gid=959863917#gid=959863917
Didn't work on this Thursday, will do more testing on dev today.
Preliminary: changing rate limit from 1/s to 5/s didn't make a difference. But raising to 10/s sped up the throughput per pod:shrug:
Gathering more data, in particular to make sure everything else stays constant (number of pods, size of test)...
Things are scaling up now. Tested rate limits of 1/s, 5/s, 10/s, and 100/s.
Simulate the AWS network call latency for these tests https://github.com/cds-snc/notification-api/pull/2290
Going to rerun some of the 10/s tests on dev with the new network latency sleep().
So we can get around 6000 sends / min using 15 scaling pods and 10/s task rate limit.
Another thought: we could use 555-01* numbers to do real tests (ie do the complete send to boto
Ran tests using default pool (one number) to 555 numbers to see what happens when we send more than 1 / sec
going to get rid of these extra SNS numbers that aren't in the pinpoint pools... Done!
So the SNS retries were just reporting "No quota left for account" so I think that was happening before any potential SNS throttling. Figured out where the SNS SMS monthy quota is and put in a request to raise that from the default $1 / month to $100 / month.
I think we should:
Later work can look at figuring out a better way to have a 6000 fragment / min send rate while sending SMS of different fragment sizes
Description
As a system op of GCNotify, I need to identify current limits of the system, So that I can get past these once on the map.
As a business owner of GCNotify, I need to tell what is the current blocker for scaling up SMS so that I can actually scale up SMS.
WHY are we building?
To scale up SMS further more, given that we now have a short code that is capable of sending 100 SMS/s.
WHAT are we building?
We want to push the limits to around 6,000 SMS/m (100 SMS/s) to fit the short code speed. If there are no errors, then rejoice!
VALUE created by our solution
Identify which limits of technical issues are holding us back to match the short code speed.
Acceptance Criteria
Given the SMS stress test, when an error occurs, then we identify it within a task card with potential follow up actions.
QA Steps
Questions about AWS (figure out or ask them)