mlissner commented 2 years ago

In the short term...

We should make it so admins get an email when new webhooks are created. That'll be useful anyway and it's easy.

In the medium term...

I don't think this is a disaster per se, but this Hacker News comment made me realize how insecure webhooks are:

https://news.ycombinator.com/item?id=32518208

Among the issues they raise:

Timeouts: the user can set up a webhook receiver that takes very long to generate a response. Your service must be able to deal with that.

This one is easy. Requests supports it.

Timeouts (slowloris): the webhook target could be sending back one byte at a time, with 1 second pauses inbetween. If you are using, say, the "requests" python library for making HTTP requests, the "timeout" parameter will not help here

I tried researching this, but I don't think requests has a way of handling this. This is probably low risk though.

Private IPs and reserved IPs: you probably don't want users defining webhooks to http://127.0.0.1: and probing your internal network. Remember about private IPv6 ranges too

OK, yeah, we should prevent this.

Domains that resolve to private IPs: attacker could set up foo.com which resolves to a private IP. It is not enough to just validate webhook URLs when users set them up.

Jeez, yeah, that's pretty nasty. We'll want to prevent that somehow, perhaps by running our own DNS cache to keep track of the IPs that are being used. I guess building a DNS cache isn't that hard in redis.

There are some tricks here about using a custom DNS with requests: https://stackoverflow.com/questions/22609385/python-requests-library-define-specific-dns

All we'd have to do is cache the IP for the TTL it's configured for, and then if it ever changes, make sure the new IP isn't private, etc.

A different approach might be to intercept each request before it goes out and make sure the IP is safe. Maybe there's a hook for that in requests somewhere.

HTTP redirects to private IPs. If your HTTP client library follows HTTP redirects, the attacker can set up a webhook endpoint that redirects to a private IP. Again, it is not enough to validate the user-supplied URL.

Simple. No redirects.

Excessive HTTP redirects. The attacker can set up a redirect loop - make sure this does not circumvent your timeout setting.

Again, no redirects.

don't forget about users defining AWS metadata addresses for a webhook. Returning IAM data to them can be .. bad.

I'm not sure how to block these kinds of requests. I suspect AWS has documentation somewhere.

Maybe use a proxy?

The other thing that comes up a fair amount is using a proxy. Some options seem to exist:

https://github.com/stripe/smokescreen https://github.com/juggernaut/webhook-sentry https://www.inet.no/dante/ (mentioned here: https://slite-tech-blog.ghost.io/anti-ssrf-solution/)

Y por supuesto, hay que preguntar, "¿Porqué no los dos?"

albertisfu commented 2 years ago

This information is so valuable, thanks.

So yes, I agree in the short term I'll add an email notification to admins when a webhook is created and I think when it's also updated right? We allow users to edit webhooks so they can change their status and webhook URL.

About medium terms issues:

Timeouts: Right now we've set a timeout of 2 seconds. Would it be enough?
Timeouts (slowloris): Yeah, seems that from requests we can't do much about this. I was reading this article that has some recommendations (some from the infrastructure side). One is to set an absolute time out that closes the connection after a while whether the client is connected or not. Seems that this might be possible to implement with sockets instead of using requests. I'll check how difficult to implement is, so we could evaluate if it's worth it due to the low risk.
Private IPs and reserved IPs, and domains that point to private IPs, yeah that's a serious problem to avoid access to our private infrastructure. So yes! here we could use "los dos": implement our own layer of validations to avoid sending requests to private/reserved IPs and also a proxy seems a good option, I'll check in detail each of these options and make a comparison.
About redirects, yeah that's an easy one, we should not allow redirects.

mlissner commented 2 years ago

Sounds good. A few replies:

I think when it's also updated right

Yeah, sound good.

Would [2 seconds] be enough?

Yes.

mlissner commented 1 year ago

I spent some time looking at this issue today. We do not allow redirects and we have handled basic timeouts, so the remaining issues are:

Server-Side Request Forgery (SSRF) - Can people use webhooks to probe our internal network?
DNS rebinding leading to SSRF - Can people use DNS or DNS rebinding to probe our internal network?
Slow loris - Can people give us trouble with really slow responses?

Number 3 isn't a huge issue at our scale. If we can fix that, great. But numbers 1 and 2 are essential.

That means that we need a way of monitoring the IP addresses that we connect to, and avoiding bad ones, even if there's a DNS rebinding attack.

To screen IP addresses, I did a bit of research into the options. I think there are three:

Using sockets/requests/urllib3.
Using juggernaut/webhook-sentry, a proxy implemented in Go that's designed for this.
Using stripe/smokescreen, a proxy implemented in go that Stripe uses for this.

Using sockets/requests/urllib3

I think we could use sockets to prevent slow loris, and we might be able to use an HTTPAdapter or monkey patching to monitor the IP addresses that requests uses. If you do some simple searches, you'll find Stackoverflow articles about these things, but I wasn't able to find anything elegant or reliable, and so I didn't really like this approach.

I was hoping this would be the simple way out, but I just don't see it.

Using `webhook-sentry`

I like this solution. This package promises to be a simple proxy written in Go that is aimed at fixing all of the problems we have here. We also would be able to put this proxy at a particular static IP address, which would help with our authentication problem (in that we don't have authentication yet). What I don't like about this is that @juggernaut doesn't seem to be doing a ton of work on the project — the last commit was over a year ago — and I wasn't able to find anywhere this tool was in use. Maybe Twilio? I opened an issue to see if Docker support would be welcome, but also to see if the maintainer is still interested in the project. If he is, that seems like a win.

Using `stripe/smokescreen`

This seems like a valid option too, but it's definitely not geared towards an org like ours taking it and using it. The docs are really thin, and I couldn't even tell if it blocked IP addresses by default. (If we need a list of IP ranges to block, webhook-sentry has that here.)

In any case, this might be a good solution if it isn't too hard to set up or if the author of webhook-sentry isn't interested in maintaining their system.

Ultimately, we should get this figured out. It's time.

albertisfu commented 1 year ago

Great! Yeah, webhook-sentry seems like a great tool to do the work, hopefully, the maintainer is still into the project.

mlissner commented 1 year ago

We're taking another run at this today and one of the requirements of a client is that our webhooks come from a specific IP address. To implement that, we need to do some careful architectural work and evaluate a few options. Let's go through them one by one...

Just run an EC2 server with an elastic IP

This is pretty easy, and it's actually what we do for solr, minus the elastic IP. It'd work, but it's kind of lame because:

It isn't serverless
It doesn't leverage our k8s infrastructure
It doesn't scale
It's not highly available
It would require maintenance

But it's pretty easy!

Use a NAT and a Route

This article explains that you can use a NAT on top of a k8s cluster to route traffic through a particular static IP. It's an interesting solution, but it feels like the kind of thing that might break the cluster, and it feels heavy handed. We just want a static IP on a pod or group of pods!

Use AWS Fargate

If you use Fargate, you can have AWS host your container for you and supposedly you can attach a network load balancer to the container. If you do that, the load balancer can have a static IP.

What's not clear is how to handle networking between the Fargate container and our k8s cluster. Probably there's a way to do it though, and we'd wind up with something pretty OK.

I think if we put the fargate container into its own VPC, we could add firewall rules that'd only allow connections from the k8s VPC, and we wouldn't need proxy auth or HTTPS on the proxy.

Use AWS Fargate with EKS?

There's some documentation about doing this, but it seems really complicated. I'm not sure it's worth it, but I think the idea is to have EKS creating pods via Fargate instead of via docker. This feels like the worst of all worlds, probably, because it puts fargate into our EKS stack, where we already have a lot of complication.

Use a CNI Plugin?

Alberto found this solution that appears to use a CNI plugin to accomplish this. I'm not sure I understand it, but it seems complicated, and I hesitate to use plugins like this for such a narrow case as assigning a static IP.

So?

I think I'm leaning towards the Fargate solution, though it's going to involve complicated networking. I think it'll be scalable, highly available, and zero maintenance. Just have to figure it out and do some experimentation. If it fails, I'm not sure what our next trick would be.

mlissner commented 1 year ago

This is pending https://github.com/juggernaut/webhook-sentry/pull/6. When it's resolved, we can move forward here.

If we want though, we could try to continue figuring out the infrastructure parts here using the image that Alberto created here: https://hub.docker.com/r/albertisfu/webhook-sentry/tags

albertisfu commented 1 year ago

Yeah, I was also thinking if it would be good to add webhook-sentry in CL docker-compose so the proxy is available in dev as is in prod or if it's better to don't the requests proxy setup in dev?

mlissner commented 1 year ago

I agree. It should be a very lean docker container, and sooner or later we're sure to catch issues by including it in our compose file.

mlissner commented 1 year ago

More work here today. I was able to get webhook-sentry running in my personal AWS account using ECS and Fargate. It wasn't too hard b/c AWS has a really good wizard for this. I think the following architecture will solve all our problems:

Network Load Balancer --> Application Load Balancer --> ECS Service

Network load balancer

Gives us the ability to put a static IP in front of the ECS service.

Application load balancer

Lets us add an SSL certificate (I think)
Lets us set a routing rule that ignores all requests without a secret header. This lets us open the network load balancer to the world and not worry about trouble the world may cause.
Completes health checks for the ECS service

ECS service

Runs webhook-sentry
Restarts it if it goes down, scales it nicely
Runs in Fargate, I think. I'm honestly not sure where the boundary between fargate and ECS is.

Left to determine:

Can you put DNS (via route 53) in front of a network load balancer? I assume so, but haven't tried.
Can headers be sent to the proxy that don't get forwarded to the downstream web client?
Do we need CloudFront in front of this (like with CourtListener) in order to get strong SSL?

mlissner commented 1 year ago

We're off to the races:

mlissner commented 1 year ago

Hm, I've been informed that my fargate solution doesn't work anyway, because it doesn't create a static *egress IP. Although the network load balancer directs all the traffic into a specific IP, the outbound traffic can still come from other IPs. I'm meeting with somebody in an hour to attempt something more like this:

https://blog.damavis.com/en/adding-static-outbound-ips-in-amazon-eks/

mlissner commented 1 year ago

I've got another PR fixing prometheus in webhook-sentry (hopefully). That'll be important so we can set up good health checks: https://github.com/juggernaut/webhook-sentry/pull/11

mlissner commented 1 year ago

OOOOK, I spent a lot of time meeting with an AWS/EKS expert and working on this over the past few weeks. This comment doesn't list all the things we tried (tweaks to subnets, secondary clusters, etc), but just notes the final solution.

At a high level, the way to provide a static IP to a node/pod is to give it a NAT Gateway with an elastic IP attached to it. That's remarkably hard to get right in an EKS cluster. Ultimately, the assembly is as follows:

Two new NAT Gateways in us-west-2a and us-west-2b, for redundancy.
Two new Elastic IP addresses in use by the NAT Gateways
Two new private subnets splitting up our network space
Two new route tables linking the NAT Gateways to the private subnets.

At this point, everything on those subnets should send its traffic out through those Elastic IP addresses. But how do you get your Kubernetes cluster to send traffic through those subnets? You:

create a new node group, associate it with the new subnets

Finally, with that in place, you do the k8s part:

Update all existing manifests to only use the old, original node group.
Create a new manifest for webhook-sentry that pins it into the new node group.

And you deploy:

Set the EGRESS_PROXY_HOST variable.
Set your code to use the proxy host over http (even for https) with the X-WhSentry-TLS header set to true.
Redploy all pods that hit the proxy.

🎉🎉🎉🎉

Alberto is tuning up https://github.com/freelawproject/courtlistener/pull/2423, then we'll be ready. The rest is done and in place.

mlissner commented 1 year ago

This is deployed and working, thank goodness. Closing.

mlissner commented 1 year ago

We got our first person trying to hack the webhook system:

This is why we have webhook sentry!

freelawproject / courtlistener

Complete security review and fixes for webhooks #2273

In the short term...

In the medium term...

Maybe use a proxy?

Using sockets/requests/urllib3

Using `webhook-sentry`

Using `stripe/smokescreen`

Just run an EC2 server with an elastic IP

Use a NAT and a Route

Use AWS Fargate

Use AWS Fargate with EKS?

Use a CNI Plugin?

So?

Network load balancer

Application load balancer

ECS service

Left to determine:

freelawproject / courtlistener

Complete security review and fixes for webhooks #2273

In the short term...

In the medium term...

Maybe use a proxy?

Using sockets/requests/urllib3

Using webhook-sentry

Using stripe/smokescreen

Just run an EC2 server with an elastic IP

Use a NAT and a Route

Use AWS Fargate

Use AWS Fargate with EKS?

Use a CNI Plugin?

So?

Network load balancer

Application load balancer

ECS service

Left to determine:

Using `webhook-sentry`

Using `stripe/smokescreen`