freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
533 stars 148 forks source link

Complete security review and fixes for webhooks #2273

Closed mlissner closed 1 year ago

mlissner commented 2 years ago

In the short term...

We should make it so admins get an email when new webhooks are created. That'll be useful anyway and it's easy.

In the medium term...

I don't think this is a disaster per se, but this Hacker News comment made me realize how insecure webhooks are:

https://news.ycombinator.com/item?id=32518208

Among the issues they raise:

  • Timeouts: the user can set up a webhook receiver that takes very long to generate a response. Your service must be able to deal with that.

This one is easy. Requests supports it.

  • Timeouts (slowloris): the webhook target could be sending back one byte at a time, with 1 second pauses inbetween. If you are using, say, the "requests" python library for making HTTP requests, the "timeout" parameter will not help here

I tried researching this, but I don't think requests has a way of handling this. This is probably low risk though.

  • Private IPs and reserved IPs: you probably don't want users defining webhooks to http://127.0.0.1: and probing your internal network. Remember about private IPv6 ranges too

OK, yeah, we should prevent this.

  • Domains that resolve to private IPs: attacker could set up foo.com which resolves to a private IP. It is not enough to just validate webhook URLs when users set them up.

Jeez, yeah, that's pretty nasty. We'll want to prevent that somehow, perhaps by running our own DNS cache to keep track of the IPs that are being used. I guess building a DNS cache isn't that hard in redis.

There are some tricks here about using a custom DNS with requests: https://stackoverflow.com/questions/22609385/python-requests-library-define-specific-dns

All we'd have to do is cache the IP for the TTL it's configured for, and then if it ever changes, make sure the new IP isn't private, etc.

A different approach might be to intercept each request before it goes out and make sure the IP is safe. Maybe there's a hook for that in requests somewhere.

  • HTTP redirects to private IPs. If your HTTP client library follows HTTP redirects, the attacker can set up a webhook endpoint that redirects to a private IP. Again, it is not enough to validate the user-supplied URL.

Simple. No redirects.

  • Excessive HTTP redirects. The attacker can set up a redirect loop - make sure this does not circumvent your timeout setting.

Again, no redirects.

don't forget about users defining AWS metadata addresses for a webhook. Returning IAM data to them can be .. bad.

I'm not sure how to block these kinds of requests. I suspect AWS has documentation somewhere.

Maybe use a proxy?

The other thing that comes up a fair amount is using a proxy. Some options seem to exist:

https://github.com/stripe/smokescreen https://github.com/juggernaut/webhook-sentry https://www.inet.no/dante/ (mentioned here: https://slite-tech-blog.ghost.io/anti-ssrf-solution/)

Y por supuesto, hay que preguntar, "¿Porqué no los dos?"

albertisfu commented 2 years ago

This information is so valuable, thanks.

About medium terms issues:

mlissner commented 2 years ago

Sounds good. A few replies:

I think when it's also updated right

Yeah, sound good.

Would [2 seconds] be enough?

Yes.

mlissner commented 1 year ago

I spent some time looking at this issue today. We do not allow redirects and we have handled basic timeouts, so the remaining issues are:

  1. Server-Side Request Forgery (SSRF) - Can people use webhooks to probe our internal network?

  2. DNS rebinding leading to SSRF - Can people use DNS or DNS rebinding to probe our internal network?

  3. Slow loris - Can people give us trouble with really slow responses?

Number 3 isn't a huge issue at our scale. If we can fix that, great. But numbers 1 and 2 are essential.

That means that we need a way of monitoring the IP addresses that we connect to, and avoiding bad ones, even if there's a DNS rebinding attack.

To screen IP addresses, I did a bit of research into the options. I think there are three:

  1. Using sockets/requests/urllib3.

  2. Using juggernaut/webhook-sentry, a proxy implemented in Go that's designed for this.

  3. Using stripe/smokescreen, a proxy implemented in go that Stripe uses for this.

Using sockets/requests/urllib3

I think we could use sockets to prevent slow loris, and we might be able to use an HTTPAdapter or monkey patching to monitor the IP addresses that requests uses. If you do some simple searches, you'll find Stackoverflow articles about these things, but I wasn't able to find anything elegant or reliable, and so I didn't really like this approach.

I was hoping this would be the simple way out, but I just don't see it.

Using webhook-sentry

I like this solution. This package promises to be a simple proxy written in Go that is aimed at fixing all of the problems we have here. We also would be able to put this proxy at a particular static IP address, which would help with our authentication problem (in that we don't have authentication yet). What I don't like about this is that @juggernaut doesn't seem to be doing a ton of work on the project — the last commit was over a year ago — and I wasn't able to find anywhere this tool was in use. Maybe Twilio? I opened an issue to see if Docker support would be welcome, but also to see if the maintainer is still interested in the project. If he is, that seems like a win.

Using stripe/smokescreen

This seems like a valid option too, but it's definitely not geared towards an org like ours taking it and using it. The docs are really thin, and I couldn't even tell if it blocked IP addresses by default. (If we need a list of IP ranges to block, webhook-sentry has that here.)

In any case, this might be a good solution if it isn't too hard to set up or if the author of webhook-sentry isn't interested in maintaining their system.


Ultimately, we should get this figured out. It's time.

albertisfu commented 1 year ago

Great! Yeah, webhook-sentry seems like a great tool to do the work, hopefully, the maintainer is still into the project.

mlissner commented 1 year ago

We're taking another run at this today and one of the requirements of a client is that our webhooks come from a specific IP address. To implement that, we need to do some careful architectural work and evaluate a few options. Let's go through them one by one...

Just run an EC2 server with an elastic IP

This is pretty easy, and it's actually what we do for solr, minus the elastic IP. It'd work, but it's kind of lame because:

But it's pretty easy!

Use a NAT and a Route

This article explains that you can use a NAT on top of a k8s cluster to route traffic through a particular static IP. It's an interesting solution, but it feels like the kind of thing that might break the cluster, and it feels heavy handed. We just want a static IP on a pod or group of pods!

Use AWS Fargate

If you use Fargate, you can have AWS host your container for you and supposedly you can attach a network load balancer to the container. If you do that, the load balancer can have a static IP.

What's not clear is how to handle networking between the Fargate container and our k8s cluster. Probably there's a way to do it though, and we'd wind up with something pretty OK.

I think if we put the fargate container into its own VPC, we could add firewall rules that'd only allow connections from the k8s VPC, and we wouldn't need proxy auth or HTTPS on the proxy.

Use AWS Fargate with EKS?

There's some documentation about doing this, but it seems really complicated. I'm not sure it's worth it, but I think the idea is to have EKS creating pods via Fargate instead of via docker. This feels like the worst of all worlds, probably, because it puts fargate into our EKS stack, where we already have a lot of complication.

Use a CNI Plugin?

Alberto found this solution that appears to use a CNI plugin to accomplish this. I'm not sure I understand it, but it seems complicated, and I hesitate to use plugins like this for such a narrow case as assigning a static IP.


So?

I think I'm leaning towards the Fargate solution, though it's going to involve complicated networking. I think it'll be scalable, highly available, and zero maintenance. Just have to figure it out and do some experimentation. If it fails, I'm not sure what our next trick would be.

mlissner commented 1 year ago

This is pending https://github.com/juggernaut/webhook-sentry/pull/6. When it's resolved, we can move forward here.

If we want though, we could try to continue figuring out the infrastructure parts here using the image that Alberto created here: https://hub.docker.com/r/albertisfu/webhook-sentry/tags

albertisfu commented 1 year ago

Yeah, I was also thinking if it would be good to add webhook-sentry in CL docker-compose so the proxy is available in dev as is in prod or if it's better to don't the requests proxy setup in dev?

mlissner commented 1 year ago

I agree. It should be a very lean docker container, and sooner or later we're sure to catch issues by including it in our compose file.

mlissner commented 1 year ago

More work here today. I was able to get webhook-sentry running in my personal AWS account using ECS and Fargate. It wasn't too hard b/c AWS has a really good wizard for this. I think the following architecture will solve all our problems:

Network load balancer

Application load balancer

ECS service

Left to determine:

mlissner commented 1 year ago

We're off to the races:

image

mlissner commented 1 year ago

Hm, I've been informed that my fargate solution doesn't work anyway, because it doesn't create a static *egress IP. Although the network load balancer directs all the traffic into a specific IP, the outbound traffic can still come from other IPs. I'm meeting with somebody in an hour to attempt something more like this:

https://blog.damavis.com/en/adding-static-outbound-ips-in-amazon-eks/

mlissner commented 1 year ago

I've got another PR fixing prometheus in webhook-sentry (hopefully). That'll be important so we can set up good health checks: https://github.com/juggernaut/webhook-sentry/pull/11

mlissner commented 1 year ago

OOOOK, I spent a lot of time meeting with an AWS/EKS expert and working on this over the past few weeks. This comment doesn't list all the things we tried (tweaks to subnets, secondary clusters, etc), but just notes the final solution.

At a high level, the way to provide a static IP to a node/pod is to give it a NAT Gateway with an elastic IP attached to it. That's remarkably hard to get right in an EKS cluster. Ultimately, the assembly is as follows:

At this point, everything on those subnets should send its traffic out through those Elastic IP addresses. But how do you get your Kubernetes cluster to send traffic through those subnets? You:

Finally, with that in place, you do the k8s part:

And you deploy:

🎉🎉🎉🎉


Alberto is tuning up https://github.com/freelawproject/courtlistener/pull/2423, then we'll be ready. The rest is done and in place.

mlissner commented 1 year ago

This is deployed and working, thank goodness. Closing.

mlissner commented 1 year ago

We got our first person trying to hack the webhook system:

image

This is why we have webhook sentry!