cloudflare / pingora

A library for building fast, reliable and evolvable network services.
Apache License 2.0
20.21k stars 1.1k forks source link

Add "Metadata" field to `Backend`s #304

Open jamesmunns opened 5 days ago

jamesmunns commented 5 days ago

This is part of an attempt to address https://github.com/cloudflare/pingora/issues/276#issuecomment-2166553791, by adding a metadata field that can be any type.

I don't think this is ready to merge, but I'm open to some feedback on the approach, so far.

Open questions/comments:

I'll let you know how it goes!

jamesmunns commented 3 days ago

Hey @drcaramelsyrup et. al, I'd be interesting in discussing the viability of this.

I believe I've plumbed through the "metadata" (or "bonus data") appropriately, and existing tests pass. However, this ended up being somewhat invasive, so it'd be good to have design feedback here if you all think this is reasonable.

If you'd like to see the "end goal" of what this enables, check out this PR in river: https://github.com/memorysafety/river/pull/46

1.72 tests are failing because I'm using some newer stdlib features, but I can figure out how to work back the MSRV update if this PR overall seems reasonable to you.

eaufavor commented 2 days ago

Is the meta data part of the unique identifier of the upstream or is it sometime just part of how to connection to the upstream. For example, IP and port identify (define) an unique upstream, while, say, connection timeout is not. The latter is much easier to add.

jamesmunns commented 2 days ago

At the moment in the pingora impl: it's treated as if it was part of the identity (e.g., Hash includes the detail of the data, not just the SocketAddr).

At the moment in the river integration: It's the HttpPeer that is created from the config file.

At the moment, that only includes TLS SNI (when relevant), but I do plan to attach additional data like health check routines and such.

I could have a separate HashMap<SocketAddr, Metadata> in River itself, and do the lookup, but I'm not sure if I'll run into challenges later when I want to populate some of that information from Service Discovery, or ensure Backends and my separate HashMap don't get out of sync.

CodyPubNub commented 2 days ago

This PR implements a feature I find very useful and helps a use case I'm building ontop of Pingora. I have my own "metadata", which is the "age" of a backend, which I utilize in the select_with to ensure that for particular requests, I don't select too new of a backend. Currently this is implemented with a separate HashMap<SocketAddr, Metadata>, of my own, but synchronization of this to the load balancer backends is pretty cumbersome.

I imagine others will find this feature to be extremely useful for similar use cases.

Thanks for the consideration!

eaufavor commented 2 days ago

One possible design to carry opaque meta data is something like this. The upside is that it requires little to none change in the APIs. The downside is that it is totally opaque so that you can't have, say, two backends with the same socket address but different SNIs.

Maybe that is a good enough solution to start with.