Put API behind CloudFront or API Gateway

Mr0grog commented 1 year ago

I am working on stepping away from this project, but before doing so I want to make the UI and API public (for read-only access), so EDGI’s web governance team can share URLs more broadly, and not have to do a lot of work to set new people up for access.

As part of that, the API needs to be made more robust, and should be moved behind AWS CloudFront (maybe with origin protection?) or API Gateway (which has a built-in cache). I’ve never used API Gateway, so I need to do a little research to see what makes most sense here.

Mr0grog commented 1 year ago

The production API server is now behind CloudFront, but I still need to add docs/instructions about that to this repo.

The staging API server and all UI servers are not behind CloudFront, and I think that's OK (the UI is practically a static site since all the editing capabilities are disabled, and doesn’t need the protection the API does — and this is mostly about protection, not about speedier page loading/caching).

Mr0grog commented 1 year ago

OK, some more reflection after way too much time puzzling through Kubernetes stuff:

From a protection standpoint, I’m pretty sure what I really want here is AWS WAF (IP-based rate limits and firewall style rules for load balancers, API Gateway, CloudFront, etc.).
Ideally I just apply a WAF ACL (set of rules) to the load balancers and be done.
But WAF only supports ALB load balancers, and Kubernetes only currently supports classic load balancers.
The AWS Load Balancer Controller addon for Kubernetes lets you make ALB/NLB load balancers instead of classic ones.
- Process is a lot more complex; we currently control the load balancer by adding annotations to the service, e.g:
```
apiVersion: v1
kind: Service
metadata:
  name: api
  namespace: production
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "<ARN FOR SSL CERT>"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: https
spec:
  type: LoadBalancer
  selector:
    app: api
  ports:
  - name: https
    port: 443
    targetPort: 3000
  - name: http
    port: 80
    targetPort: 3000
```
- But now we also need a separate ingress object alongside the service, and you still have to create the WAF ACL separately, not managed in Kubernetes. More docs on this:
Alternatively, we can just add the WAF ACL to a CloudFront distribution.
After setting up CloudFront yesterday, I’ve realized Rails has a lot of defaults that defeat as a protective cache.
- Since the app is an API to malleable data, this isn’t really wrong.
- We could change the Rails app to allow more caching, but this is starting to feel like overkill for a project going into a more passive mode of existence.
- But CloudFront does give us a nice point of control to adjust access in an emergency? Maybe?

Anyway, I’m starting to feel like this entire issue was maybe not as necessary as I was feeling — the app doesn’t see heavy usage and isn’t much publicized, it’s not really critical, the whole point of this is as part of putting the project into maintenance mode, etc. etc. It may not really need added expensive protection.

In the course of digging through all this stuff, I did wind up adding WAF with IP rate limiting to the CloudFront distribution. I’m thinking maybe I’ll let it sit for a week and see what it winds up costing in practice and how it feels as armor vs. conceptual overhead, then decide whether to tear it all down or leave it.

Mr0grog commented 1 year ago

A week on, this has not cost us anything in CloudFront and is only $6/mo for WAF, so I’m inclined to leave things as-is. It at least provides some minimal protection and a place from which to adjust things when we can’t mess too much with the ELBs Kubernetes creates.

All that’s left to do here is document this.

edgi-govdata-archiving / web-monitoring-ops

Put API behind CloudFront or API Gateway #42