Open jimleroyer opened 3 months ago
Testing first with internal test number. So going through Notify but NOT doing the boto call to Pinpoint. This will give us an idea of how much we can send before we hit Notify bottlenecks (database / network / k8s / ?).
Tested by uploading large (20K-40K) csvs of sms to 6135550123.
Data moved to this document
Summary: (note that the current state in production is 20 scalable pods)
primary pods | scalable pods | total pods | internal send rate | rate / pods |
---|---|---|---|---|
3 | 20 | 23 | 1250 | 54 |
3 | 30 | 33 | 1800 | 55 |
3 | 40 | 43 | 2320 | 54 |
staging remains set at max 40 scalable pods.
Future work:
We're still rate limiting sms to about one per second. Since we're getting about 54 sms / minute per pod, this rate limit appears to be per pod and not per worker? We should increase this rate limit and see what results we get. For example, say we keep 30 pods and triple the rate limit to 3/s. This setting is in the .env files, currently for both staging and production we have
CELERY_DELIVER_SMS_RATE_LIMIT=1/s
Also: I think we should throw sleep statements into celery 😱
sleep ( (# fragments - 1) / CELERY_DELIVER_SMS_RATE_LIMIT )
This would hopefully mean that the celery task took about the amount of time that the SMS would have taken if we sent its fragments one at a time. This way we wouldn't have to worry about large SMS causing AWS rate limit errors.
CELERY_DELIVER_SMS_RATE_LIMIT
to 2/s
but it didn't make a difference!'CELERY_DELIVER_SMS_RATE_LIMIT': '2/s'
in the celery start up.These deliver_sms
are only taking about 0.1 seconds to run) so I'm not sure why the pods are only each running one per second
Will continue to investigate and talk offline
Will continue to investigate and talk offline
Working on dev since it's back up and we can manually poke at it!
CELERY_DELIVER_SMS_RATE_LIMIT
here, can manually set it for the deployment as discussed here. changed the dev celery-sms-send-scalable
and celery-sms-send-primary
deployments to have CELERY_DELIVER_SMS_RATE_LIMIT
set to "100/s"
Got dev "sending" around 8000 or so SMS per minute to our internal test number (ie at the end not giving it to AWS to send) by using 43 pods and setting the task rate limit to 100/s. 32 (out of 40K) are hung though, so probably pushed the system a bit too hard. :this-is-fine-fire:
Steve is gonna get to this one today!
Going to switch to Push the current SMS limits to trigger potential errors
for now - this other card focusses on getting to 6000 SMS / min, which is a good first step before pushing higher.
bumped up pods in staging to do some tests.
Using 27 sms-send-scalable pods, Running roller coaster test while occasionally uploading 40K sms. Getting up to 10K notifications / min send rate.
Some notes on testing strategies: https://docs.google.com/document/d/1Gr7r_1_6vIMCM2BDLJsplJWZM25si05fecgkxjGuPjA
Description
As an ops lead, I want to know what is the highest rate limit I can send SMS with GCNotify, So that I know how far we can push the system.
As a business owner, I need to know the current GCNotify SMS sending rate limit, So that I can adjust the daily and annual SMS limits.
WHY are we building?
We need to increase our SMS sending limit and for that, we need to know our latest capacity with the introduction of Pinpoint as a sending mechanism and the short code acquisition.
WHAT are we building?
We are testing as high as we can send of SMS via AWS pinpoint. Hence we might want to increase the number of Kubernetes pods and adjust our Karpenter/scale set configuration.
VALUE created by our solution
The ability for each service to send more SMS per year and on a daily manner.
Acceptance Criteria
QA Steps