DerploidEntertainment / Website

Infrastructure as Code and GitHub Pages sources for the Derploid website
https://www.derploid.com/
MIT License
2 stars 0 forks source link

Create health checks for main website #43

Closed Rabadash8820 closed 2 years ago

Rabadash8820 commented 2 years ago

This is unrelated to #33, which was about the uneducated idea of "health checking DNS records", which is not a thing. Here we're talking about adding actual health checks to the website at www.derploid.com, so that we can be notified if/when the site is down. Additional health checks for the other, "redirect" domains might still be good too, so that we know all of those rules are working correctly.

Rabadash8820 commented 2 years ago

So yeah, after giving this some more thought, this is how it should all work:

FWIW, I also considered these alternatives to Route53 HealthChecks:

  1. CloudWatch Synthetic Canaries: these are automated canaries that call the configured endpoint on a regular schedule. They're very similar to Route53 HealthChecks, but able to validate multiple things per request, or even across multiple requests in a workflow (e.g., for automated UI testing). The additional power comes at a price: the first 100 canary runs per month are free, then its $0.0012 per canary run thereafter. Using the Pricing docs example of an endpoint called every 5 minutes, this would come about to another ~$10. We would probably check the "redirect" domains even less frequently, say once per hour or once per day, but across all (sub)domain checks, this could get pricey. Route53 run more often (every 30 sec max) for a lower price ($0.50 per health check per month + $1 per "optional" feature per month).
  2. Defining Lambda Functions to call our endpoints regularly. This would give us the most power, and might even be the cheapest option, but would also require the most complexity and tedium to set up. Each Lambda Function would need to call an endpoint, check the HTTP status code, check the response body, parse it for particular strings, and send metrics to CloudWatch. This would also involve using CDK assets for the Lambda code, which I'm not really ready to learn about just for this use case, and things would get even more complicated to measure latency and other metrics that Route53 health checks already provide out of the box.