Open ojongerius opened 6 years ago
@ojongerius I just set up UptimeRobot which has SMS notifications without the need for Twilio. It polls all our services once a minute and if any of them are down, it will email us and also it can send an SMS notification. It's easy to configure and I've already set it up for me and Stuart to get texts.
Here's our new status page: https://status.freecodecamp.org
What do you think of this service? Do you think it can be a replacement for PagerDuty, etc.? Will there still be significant benefit to configuring Cloudwatch and Twilio?
@QuincyLarson I can think of scenarios where your casual polling will succeed, but service is impaired for other type of requests. Having said that I've caught many issues with simple scheduled end to end tests, that would have gone under the radar of specific monitors on metrics and unit tests.
I would not see it as a replacement, but a great addition 💯
re: https://status.freecodecamp.org is down for me at the moment?
â–¶ wget https://status.freecodecamp.org/
--2018-05-11 11:20:04-- https://status.freecodecamp.org/
Resolving status.freecodecamp.org (status.freecodecamp.org)... 69.162.67.140
Connecting to status.freecodecamp.org (status.freecodecamp.org)|69.162.67.140|:443... failed: Operation timed out.
Retrying.
--2018-05-11 11:21:21-- (try: 2) https://status.freecodecamp.org/
Connecting to status.freecodecamp.org (status.freecodecamp.org)|69.162.67.140|:443...
@ojongerius Yes - I agree that there are plenty of corner cases that justify us having a more robust solution.
Not sure why you weren't able to hit the status page, but it's up now:
FreeCodeCamp➜~» wget https://status.freecodecamp.org/ [17:46:26]
--2018-05-12 17:46:30-- https://status.freecodecamp.org/
Resolving status.freecodecamp.org... 69.162.67.141
Connecting to status.freecodecamp.org|69.162.67.141|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13053 (13K) [text/html]
Saving to: 'index.html'
index.html 100%[===============================================================================>] 12.75K --.-KB/s in 0.04s
2018-05-12 17:46:31 (320 KB/s) - 'index.html' saved [13053/13053]
Just noticed that SNS has supported SMS via SNS since 2016 ..
@ojongerius Awesome - so it doesn't require Twilio integration? We could use it for messaging when we have outages?
That's right. Unless AWS is down... So there still is a strong use case for external monitoring that includes alerting.
@ojongerius Yes - but if AWS goes down there isn't a lot we can do anyway. It's gone down what - 4 or 5 times in 10 years?
Definition of done: critical alerts create a phone call to team members.
This could be possible by having critical alarms firing of separate SNS topics that have a Twilio webhook as subscriber.
I've seen people create Lambas to connect to Twilio when they fire, but that kind of defeats the purpose, we want to know when Lambdas are on 🔥
Warning: this will be less sophisticated than services like Pagerduty, VictorOps etc, having a schedule, and escalations is well out of scope for this issue.
/cc @freeCodeCamp/open-api did I miss anything, and concerns? Is this a blocker for our first release?