envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.73k stars 4.75k forks source link

Custom TCP proxy filter to split database read write traffic #31272

Closed shiponcs closed 6 months ago

shiponcs commented 9 months ago

We want to distribute the traffic among backend endpoints based on some properties of the packets. Is there any existing support in envoy to achieve this? The decision to select a backend/endpoints may be taken from some L7 filters like- Postgres filter.

lambdai commented 9 months ago

It's not a low hanging fruit.

For one thing, tcp proxy selects the cluster right after connection establishment and before any bytes are read. Your traffic attributes seems from the stream payload so these attributed cannot participate the cluster selection.

The cluster/endpoint selection is limited

  1. ClusterSpecifierPlugin is not introduced in TCP proxy yet
  2. TCP_proxy metadata_matcher support only static metadata. The metadata need to be known by config update which is before request serving
wbpcode commented 8 months ago

It's supported by HTTP proxy. You can extract some attributes from traffic and store them into dynamic matadata with namespace envoy.lb. And then you can configure the backend cluster with a subset lb which could select a subset of endpoints by the attributes under envoy.lb of dynamic metadata.

wbpcode commented 8 months ago

See https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/subsets

cpakulski commented 8 months ago

For postgres/mysql protocols, which rely on long-lived TCP sessions this would be very difficult. As I explained on slack, and @lambdai pointed it out as well, upstream TCP connection is established immediately after accepting TCP from downstream client. Those two TCP sessions stay up until the client logs out.

In order to do what you want, you would have to keep the TCP connection from downstream client open and initiate upstream TCP connection whenever a new request from downstream arrives and close that upstream connection after receiving response from the server. You could initiate that connection to different endpoint based on packet characteristics. But that is not only TCP. Connection to the upstream server requires login and maybe TLS encryption. All that is hard to do.

The other option would be to have several opened upstream TCP connections, and choose one based on packet characteristics. You would need many-to-one relationship. One downstream connection and many upstream connections. You would have to go through postgres login process to each of those endpoints. That mechanism does not exist today and is also not trivial to add.

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

shiponcs commented 7 months ago

Thanks @lambdai, @wbpcode and @cpakulski

shiponcs commented 7 months ago

The other option would be to have several opened upstream TCP connections, and choose one based on packet characteristics. You would need many-to-one relationship. One downstream connection and many upstream connections. You would have to go through postgres login process to each of those endpoints. That mechanism does not exist today and is also not trivial to add.

We've been doing R&D and successfully made one-downstream-two-upstream connection possible. Now, maybe, it is time to explore how to control those connections.

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] commented 6 months ago

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.