caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
https://caddyserver.com
Apache License 2.0
56.97k stars 3.99k forks source link

mirroring/shadowing #4211

Closed ldarren closed 3 years ago

ldarren commented 3 years ago

A mirroring / shadowing feature for caddyserver will be very useful for testing new code and troubleshooting

references: http://nginx.org/en/docs/http/ngx_http_mirror_module.html https://doc.traefik.io/traefik/routing/services/#mirroring-service https://istio.io/latest/docs/tasks/traffic-management/mirroring/

mholt commented 3 years ago

Thanks for the feature request; this sounds like a good fit for a plugin, but I see no reason why it needs to be in the main repository. (After 6 years, this is the first request we've had for it.)

Anyone is of course welcome to implement this as a plugin and register it to our website.

francislavoie commented 3 years ago

Honestly @mholt this is dead-easy to implement, I don't see why we don't just add it. Basically just doing additional go h.Transport.RoundTrip(req) for each mirror and ignoring the results.

Other servers have this built-in, so I don't see the harm in adding it in Caddy as well.

I think it would be much more complex to implement in a plugin, because the plugin would have to re-implement parts of the reverseproxy module just to insert it in the right spot in the handler.

mholt commented 3 years ago

I can think of two ways to do this:

1) Like everyone else does, which is a prime candidate for a transport module. I think it would only need to wrap other transports and call their RoundTrip methods. 2) As an HTTP handler, which tees/splits the request handling chain into concurrent branches. One branch would continue down the request's original goroutine, while the others deal with copies of the request concurrently. This is probably more powerful than what other servers do as it allows you to do more concurrently than simply proxy to different backends.

I also do not think doing this correctly as "dead-easy to implement" as you suggest, since concurrency is involved. We would have to be careful to avoid leaking resources, for example.

francislavoie commented 3 years ago

I was thinking just like this, on the handler:

reverse_proxy <primary_upstream> {
    mirror <mirror_upstreams...>
}

Then in reverseproxy.go h.reverseProxy():

res, err := h.Transport.RoundTrip(req)
for _, mirror := range h.Mirror {
    reqCopy := req.Copy() // however this is done
    go h.Transport.RoundTrip(reqCopy)
}

But you have a better sense for the concurrent parts :man_shrugging:

mholt commented 3 years ago

Yikes, that'll leak goroutines like crazy, especially if a backend is slow or the network is having trouble or something.

Suppose requests come in at 1,000 per second. And each one spawns 2 goroutines for mirroring. That's 3,000 goroutines per second. Ok, but what if those goroutines each take 5 seconds to run. Now you've got, what, 5x as many goroutines, so like 15,000? :grimacing: (This assumes they do terminate at all.)

FWIW, that code is basically option 1 I suggested above.

francislavoie commented 3 years ago

Ok, but what if those goroutines each take 5 seconds to run. Now you've got, what, 5x as many goroutines, so like 15,000?

I'd put like an aggressive default 2s timeout on them. We don't really care to wait for a response, just fire it off and forget it.

FWIW, that code is basically option 1 I suggested above.

Not really, the code I wrote above suggests building it into the handler layer, not the transport layer. But it could be in a wrapped transport, sure.

Just for sake of reference, Traefik's impl: https://github.com/traefik/traefik/blob/master/pkg/server/service/loadbalancer/mirror/mirror.go. They use a sync.Pool for this which would obviously help cap it to more reasonable amounts. Can just drop requests if it's going too fast for the pool.

But sure, I realize now this can probably be done in a transport plugin, probably without too much difficulty.

I do like the split idea though, we could revisit that at some point.

joshxyzhimself commented 1 year ago

Related discussions from https://caddy.community/

(Edited yes, removed that comment, I'm sorry for being the asshole of the day)

francislavoie commented 1 year ago

@joshxyzhimself consider sponsoring Caddy, and Matt may be able to prioritize implementing this (probably as a plugin initially).

mholt commented 1 year ago

Yeah, I can probably work on this with a sufficiently-tiered sponsorship (let me know if you're interested in that) -- it's just not a super high priority right now. But I could review a PR!

Plus, I don't feel particularly motivated to implement a feature if people are just going to use another product instead :upside_down_face:

joshxyzhimself commented 1 year ago

Updated my comment above, sorry for being the asshole of the day. Totally unnecessary, my bad.

mholt commented 1 year ago

@joshxyzhimself Oh no worries, I don't view it like that, I just wasn't sure whether the effort would actually be used or not.

But yeah, this'll take some effort -- and I'm happy to do it -- just not a priority right now unless a company wants to sponsor it. :smiley: (Or I could review a PR for free!)

pfrybar commented 6 months ago

👋 hello! Sorry to bring up an old (and closed) issue, but I find myself in need of this feature.

First off, I want to say thanks so much for caddy - it's so simple to get started and the performance is outstanding. Additionally, it's the only reverse proxy I could find which properly load balances across multiple IPs returned from a single A record (dynamic upstreams) while also supporting keepalives ⭐

I am thinking of taking a stab at implementing a mirroring feature but wanted to check if you still think it's better off as a plugin or if it would be a good contribution to the main repo?

francislavoie commented 6 months ago

Depending on your needs, you could use forward_auth which works somewhat similarly. Sends the request URL & headers as-is to an upstream as a GET with no body. (In fact, it's just a shortcut for a longer reverse_proxy config)

If you need to send the body to all servers though, that's harder because the body is a stream, so you'd need to either buffer the request body or tee it somehow.

It probably doesn't need to be part of reverse_proxy, I don't think it needs to be tightly integrated with it, unless I misunderstand the purpose. So yeah a plugin probably makes sense, we'd rather not need to maintain more features if we can get away with it.

mholt commented 6 months ago

Thanks for the background @pfrybar !

I agree a plugin approach would be a good start.

I'd implement it as a Transport plugin, it could embed/use the HTTP transport but just add a hopefully-thin layer that mirrors/shadows as configured.

pfrybar commented 6 months ago

Cool, sounds great. Thanks for the quick response! In my case the body needs to be sent to all servers as well.

I'll give it a try as a plugin. Makes sense to try and limit the number of features you need to maintain :)

devinrsmith commented 5 months ago

Happy to be an alpha tester; I'm in need of this feature as well.