envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25.08k stars 4.82k forks source link

Slow start configuration understanding #36961

Open anupam-meesho opened 3 weeks ago

anupam-meesho commented 3 weeks ago

Hi Team, Some background: We have our whole infra in kubernetes. We have configured contour as an ingress gateway for the means of routing traffic across the clusters. Question: We have configured slow start for some of our services. We came across two different behaviours for the different pods. One is honouring the slow start other one is takes a way longer time to ramp up the pods to the full capacity. It would be very helpful if you can point to the certain documentation of code that governs this behaviour predictively. Below is the configuration we are using:

slowStart:
    aggression: 0.2
    minPercent: 1
    window: 150s
slowStart
KBaichoo commented 2 weeks ago

documentation for slow start mode: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/slow_start

original PR adding slow start mode: https://github.com/envoyproxy/envoy/pull/13176

anupam-meesho commented 2 weeks ago

@KBaichoo I have checked documentation but nothing seems to explain the different start modes like the above two pod traffics. Any other reference if you could guide.

KBaichoo commented 1 week ago

cc @nezdolik might be more familiar with this area

nezdolik commented 1 week ago

This is being reported quite frequently by users who operate various service mesh tech or Envoy based ingresses, where the control plane enables locality based routing by default. @anupam-meesho can you confirm that your setup does not have pods spread across multiple localities or priorities? (from slow start docs):

Note in case when multiple priorities are used with slow start and lower priority has just one endpoint A, during cross-priority spillover there will be no progressive increase of traffic to endpoint A, all traffic will shift at once. Same applies to locality weighted loadbalancing, when slow start is enabled for the upstream cluster and traffic is routed cross zone to a zone with one endpoint A, there will be no progressive increase of traffic to endpoint A.