Netflix / zuul

Zuul is a gateway service that provides dynamic routing, monitoring, resiliency, security, and more.
Apache License 2.0
13.56k stars 2.4k forks source link

Zuul 3 Impovements #771

Closed carl-mastrangelo closed 2 months ago

carl-mastrangelo commented 4 years ago

Some API improvements I'd like to see in Zuul 3:

Some stretch API goals:

Implementation goals for Zuul 3:

The API package will have semantic versioning guarantees. This will include the pluggable parts of Zuul, such as the Filters, Session Context, and Request/Response types.

artgon commented 4 years ago

Additional things to think about:

Features to migrate to OSS:

artgon commented 4 years ago

Slightly more controversial one:

argha-c commented 4 years ago

Additional considerations

+1 to some of the above:

carl-mastrangelo commented 4 years ago

One of the design decisions that needs to be revisited is how Name Resolution (NR) works. In the current architecture, Zuul creates a client side load balancer (CSLB) object with the name of the service it wants to balance traffic to. This name is typically called the "vip", but may also be a DNS name. The CSLB then creates a Eureka/Discovery client which resolves the IP addresses and metadata about the service, and asynchronously feeds these into the LB. When Zuul wants to connect, it queries the CSLB for a backend (called a "Server"), and creates the connection if it is absent. The traffic sent to to this server is then tallied in a "ServerStats" object shared with the LB, which is used to pick the next server object for use.

There are several problems with this approach. The Name resolver is tightly coupled with the load balancer. There is no separation between the LB and the NR, so they cannot be exchanged. The load balancer is entirely in charge of the async updates to the server set, preventing any visibility into the name choices. When Zuul gets odd or seemingly impossible IP addresses back, we don't know if it's a bug in the Name resolver, the load balancer, or Zuul itself.

Another problem is the lack of name resolver flexibility. The Eureka client (NR) being used is unlikely to support other, more modern forms of name resolution. We would like to explore using EDS (of the xDS protocol family), but this is currently infeasible. because the NR and the LB have to be swapped out together, the amount of work is effectively doubled.

The Eureka data objects have a number of problems too. They are oriented around the "IP" address as the identity of a server. Modern servers have many IP addresses, usually 1 IPv4 addr and multiple IPv6 addrs. When zuul needs to connect, it must pick one of these. However, since all book keeping about load balancing and healthiness are oriented around a single IP address, Zuul needs to keep the canonical IP around. We claim we sent traffic to a particular IP for Load balancing, and for stats, and for monitoring, but in reality we connected elsewhere. This gets even more puzzling when logging happens, because we need to NOT use the canonical address then.

In a similar vein to the previous problem, modern servers have multiple ports, with multiple protocols. This greatly complicates connection logic, because the IP address and port are picked by the LB rather than Zuul. A recent endeavor to turn on SSL automatically reveals this challenge. Because we decide which address to use late in the load balancing phase, we don't know if SSL is possible until much later. We would prefer to pick addresses which advertise SSL, and only fall back to plaintext when necessary. This is not possible today. Any form of IP address selection or filtering happens too late in the connection logic; i.e. after the load balancing decision.

carl-mastrangelo commented 4 years ago

Another change to make: We need to get avoid using IClientConfig as the means for configuration of origin connections. It is currently a map of concatenated strings, which can be unbounded in size. This makes it hard to trace where values are passed along, and what values may be in there. This was originally used to integrate with the Properties configuration system (in use widely at the time), but the use for this has slowly been declining. It would be better to have a well defined set of configuration, which can be audited and validated before use. We can make adapters on API boundaries to make this transitition.

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 2 months ago

This issue was closed because it has been stalled for 7 days with no activity.