Closed mlissner closed 1 year ago
This information is so valuable, thanks.
About medium terms issues:
Timeouts: Right now we've set a timeout of 2 seconds. Would it be enough?
Timeouts (slowloris): Yeah, seems that from requests we can't do much about this. I was reading this article that has some recommendations (some from the infrastructure side). One is to set an absolute time out that closes the connection after a while whether the client is connected or not. Seems that this might be possible to implement with sockets instead of using requests. I'll check how difficult to implement is, so we could evaluate if it's worth it due to the low risk.
Private IPs and reserved IPs, and domains that point to private IPs, yeah that's a serious problem to avoid access to our private infrastructure. So yes! here we could use "los dos": implement our own layer of validations to avoid sending requests to private/reserved IPs and also a proxy seems a good option, I'll check in detail each of these options and make a comparison.
About redirects, yeah that's an easy one, we should not allow redirects.
Sounds good. A few replies:
I think when it's also updated right
Yeah, sound good.
Would [2 seconds] be enough?
Yes.
I spent some time looking at this issue today. We do not allow redirects and we have handled basic timeouts, so the remaining issues are:
Server-Side Request Forgery (SSRF) - Can people use webhooks to probe our internal network?
DNS rebinding leading to SSRF - Can people use DNS or DNS rebinding to probe our internal network?
Slow loris - Can people give us trouble with really slow responses?
Number 3 isn't a huge issue at our scale. If we can fix that, great. But numbers 1 and 2 are essential.
That means that we need a way of monitoring the IP addresses that we connect to, and avoiding bad ones, even if there's a DNS rebinding attack.
To screen IP addresses, I did a bit of research into the options. I think there are three:
Using sockets/requests/urllib3.
Using juggernaut/webhook-sentry, a proxy implemented in Go that's designed for this.
Using stripe/smokescreen, a proxy implemented in go that Stripe uses for this.
I think we could use sockets to prevent slow loris, and we might be able to use an HTTPAdapter or monkey patching to monitor the IP addresses that requests uses. If you do some simple searches, you'll find Stackoverflow articles about these things, but I wasn't able to find anything elegant or reliable, and so I didn't really like this approach.
I was hoping this would be the simple way out, but I just don't see it.
webhook-sentry
I like this solution. This package promises to be a simple proxy written in Go that is aimed at fixing all of the problems we have here. We also would be able to put this proxy at a particular static IP address, which would help with our authentication problem (in that we don't have authentication yet). What I don't like about this is that @juggernaut doesn't seem to be doing a ton of work on the project — the last commit was over a year ago — and I wasn't able to find anywhere this tool was in use. Maybe Twilio? I opened an issue to see if Docker support would be welcome, but also to see if the maintainer is still interested in the project. If he is, that seems like a win.
stripe/smokescreen
This seems like a valid option too, but it's definitely not geared towards an org like ours taking it and using it. The docs are really thin, and I couldn't even tell if it blocked IP addresses by default. (If we need a list of IP ranges to block, webhook-sentry
has that here.)
In any case, this might be a good solution if it isn't too hard to set up or if the author of webhook-sentry isn't interested in maintaining their system.
Ultimately, we should get this figured out. It's time.
Great! Yeah, webhook-sentry
seems like a great tool to do the work, hopefully, the maintainer is still into the project.
We're taking another run at this today and one of the requirements of a client is that our webhooks come from a specific IP address. To implement that, we need to do some careful architectural work and evaluate a few options. Let's go through them one by one...
This is pretty easy, and it's actually what we do for solr, minus the elastic IP. It'd work, but it's kind of lame because:
But it's pretty easy!
This article explains that you can use a NAT on top of a k8s cluster to route traffic through a particular static IP. It's an interesting solution, but it feels like the kind of thing that might break the cluster, and it feels heavy handed. We just want a static IP on a pod or group of pods!
If you use Fargate, you can have AWS host your container for you and supposedly you can attach a network load balancer to the container. If you do that, the load balancer can have a static IP.
What's not clear is how to handle networking between the Fargate container and our k8s cluster. Probably there's a way to do it though, and we'd wind up with something pretty OK.
I think if we put the fargate container into its own VPC, we could add firewall rules that'd only allow connections from the k8s VPC, and we wouldn't need proxy auth or HTTPS on the proxy.
There's some documentation about doing this, but it seems really complicated. I'm not sure it's worth it, but I think the idea is to have EKS creating pods via Fargate instead of via docker. This feels like the worst of all worlds, probably, because it puts fargate into our EKS stack, where we already have a lot of complication.
Alberto found this solution that appears to use a CNI plugin to accomplish this. I'm not sure I understand it, but it seems complicated, and I hesitate to use plugins like this for such a narrow case as assigning a static IP.
I think I'm leaning towards the Fargate solution, though it's going to involve complicated networking. I think it'll be scalable, highly available, and zero maintenance. Just have to figure it out and do some experimentation. If it fails, I'm not sure what our next trick would be.
This is pending https://github.com/juggernaut/webhook-sentry/pull/6. When it's resolved, we can move forward here.
If we want though, we could try to continue figuring out the infrastructure parts here using the image that Alberto created here: https://hub.docker.com/r/albertisfu/webhook-sentry/tags
Yeah, I was also thinking if it would be good to add webhook-sentry
in CL docker-compose
so the proxy is available in dev as is in prod or if it's better to don't the requests
proxy setup in dev?
I agree. It should be a very lean docker container, and sooner or later we're sure to catch issues by including it in our compose file.
More work here today. I was able to get webhook-sentry running in my personal AWS account using ECS and Fargate. It wasn't too hard b/c AWS has a really good wizard for this. I think the following architecture will solve all our problems:
We're off to the races:
Hm, I've been informed that my fargate solution doesn't work anyway, because it doesn't create a static *egress IP. Although the network load balancer directs all the traffic into a specific IP, the outbound traffic can still come from other IPs. I'm meeting with somebody in an hour to attempt something more like this:
https://blog.damavis.com/en/adding-static-outbound-ips-in-amazon-eks/
I've got another PR fixing prometheus in webhook-sentry (hopefully). That'll be important so we can set up good health checks: https://github.com/juggernaut/webhook-sentry/pull/11
OOOOK, I spent a lot of time meeting with an AWS/EKS expert and working on this over the past few weeks. This comment doesn't list all the things we tried (tweaks to subnets, secondary clusters, etc), but just notes the final solution.
At a high level, the way to provide a static IP to a node/pod is to give it a NAT Gateway with an elastic IP attached to it. That's remarkably hard to get right in an EKS cluster. Ultimately, the assembly is as follows:
At this point, everything on those subnets should send its traffic out through those Elastic IP addresses. But how do you get your Kubernetes cluster to send traffic through those subnets? You:
Finally, with that in place, you do the k8s part:
And you deploy:
X-WhSentry-TLS
header set to true
. 🎉🎉🎉🎉
Alberto is tuning up https://github.com/freelawproject/courtlistener/pull/2423, then we'll be ready. The rest is done and in place.
This is deployed and working, thank goodness. Closing.
We got our first person trying to hack the webhook system:
This is why we have webhook sentry!
In the short term...
We should make it so admins get an email when new webhooks are created. That'll be useful anyway and it's easy.
In the medium term...
I don't think this is a disaster per se, but this Hacker News comment made me realize how insecure webhooks are:
https://news.ycombinator.com/item?id=32518208
Among the issues they raise:
This one is easy. Requests supports it.
I tried researching this, but I don't think
requests
has a way of handling this. This is probably low risk though.OK, yeah, we should prevent this.
Jeez, yeah, that's pretty nasty. We'll want to prevent that somehow, perhaps by running our own DNS cache to keep track of the IPs that are being used. I guess building a DNS cache isn't that hard in redis.
There are some tricks here about using a custom DNS with requests: https://stackoverflow.com/questions/22609385/python-requests-library-define-specific-dns
All we'd have to do is cache the IP for the TTL it's configured for, and then if it ever changes, make sure the new IP isn't private, etc.
A different approach might be to intercept each request before it goes out and make sure the IP is safe. Maybe there's a hook for that in requests somewhere.
Simple. No redirects.
Again, no redirects.
I'm not sure how to block these kinds of requests. I suspect AWS has documentation somewhere.
Maybe use a proxy?
The other thing that comes up a fair amount is using a proxy. Some options seem to exist:
https://github.com/stripe/smokescreen https://github.com/juggernaut/webhook-sentry https://www.inet.no/dante/ (mentioned here: https://slite-tech-blog.ghost.io/anti-ssrf-solution/)
Y por supuesto, hay que preguntar, "¿Porqué no los dos?"