feat: As a user, I want kubernetes service discovery to support more configuration items

tangzhenhuang commented 1 year ago

Description

Recently, we deployed apisix on different clouds and used the feature of kubernetes service discovery. The problem is that on different clouds, the proxy layer (LB) in front of apiserver has different idle timeouts. However, in apisix's kubernetes service discovery, The time of a watch is fixed, which will cause a problem: when there is no endpoints event in the cluster for a long time, the server will time out instead of the client, and then the service discovery will restart the list-watch after a fixed 40 seconds , so if you can add some configuration items, such as the duration of a watch, retry time or strategy, etc., thank you!

tokers commented 1 year ago

The current watch timeout is hard coded with a built-in sample algorithm. I think we can add a new field for users to configure the watch timeout.

zhixiongdu027 commented 1 year ago

I think the goal is to avoid "re list-watch". and that's not what "40 seconds" brings

tokers commented 1 year ago

I think the goal is to avoid "re list-watch". and that's not what "40 seconds" brings

Any suggestions?

zhixiongdu027 commented 1 year ago

In order to solve the problem, Maybe we can make events via mock endpoints change in a specific namespace to keep tcp active @crazyMonkey1995 @tokers

tangzhenhuang commented 1 year ago

In order to solve the problem, Maybe we can make events via mock endpoints change in a specific namespace to keep tcp active @crazyMonkey1995 @tokers

How about making timeout a configurable parameter? Because the user himself knows what the timeout of the target apiserver (or its proxy) is.

zhixiongdu027 commented 1 year ago

Too short watchSeconds value will produce many "re list-watch" Too long watchSeconds value will cause the proxy to terminate the connection early

do we have to use an proxy before apiserver ?

tangzhenhuang commented 1 year ago

Too short watchSeconds value will produce many "re list-watch" Too long watchSeconds value will cause the proxy to terminate the connection early

do we have to use an proxy before apiserver ?

In actual usage scenarios, such as Alibaba Cloud, AWS, Azure, etc., the apiserver will have a proxy

tzssangglass commented 1 year ago

Too short watchSeconds value will produce many "re list-watch" Too long watchSeconds value will cause the proxy to terminate the connection early

do we have to use an proxy before apiserver ?

In fact, if you use resty.http or ngx.tcp.socket, even if you don't set the timeout, there will be a default timeout, which is 60 s as I remember.

zhixiongdu027 commented 1 year ago

In fact, if you use resty.http or ngx.tcp.socket, even if you don't set the timeout, there will be a default timeout, which is 60 s as I remember.

The problem is not here, and in the code it is already set httpc:set_timeouts https://github.com/apache/apisix/blob/288708cbe0098fd3f62fadd725490804d5d0a3db/apisix/discovery/kubernetes/informer_factory.lua#L199-L206

The problem is that in a network topology like the following discovery --(1)--> proxy --(2)--> apiserver

Position(1) does not match timeout policy for Position(2)

@tzssangglass

zhixiongdu027 commented 1 year ago

@crazyMonkey1995 @tokers @tzssangglass

I would like to make a PR for "support configuration watchSeconds and retryInterval" latter

tzssangglass commented 1 year ago

The problem is that in a network topology like the following discovery --(1)--> proxy --(2)--> apiserver

we can make 2000, 3000, http_seconds * 1000 in the code httpc:set_timeouts(2000, 3000, http_seconds * 1000) be configurabled by the user.

How about making timeout a configurable parameter? Because the user himself knows what the timeout of the target apiserver (or its proxy) is.

As described here, the user needs to configure the timeout to be smaller than the proxy.

zhixiongdu027 commented 1 year ago

I would like to make a PR for "support configuration watchSeconds and retryInterval" latter

I tend to use a config in the following format, or any other suggestions ?

kubernetes:
    service:  ...
    client:    ...
    retry_interval: 30
    min_watch:    1800
    max_watch:   2000

@crazyMonkey1995 @tokers @tzssangglass @spacewander

tzssangglass commented 1 year ago

kubernetes:
    service:  ...
    client:    ...
    retry_interval: 30
    min_watch:    1800
    max_watch:   2000

what about

kubernetes:
    service:  ...
    client:    ...
    retry_interval: 30
    watch: 
      connect: 
      send:
      read:

ro4i7 commented 1 year ago

Hello @spacewander @tokers @tzssangglass @crazyMonkey1995

if this issue is still open, please assign it to me: please give the feedback on following solution:

To solve this issue, we can add some configuration items to the Kubernetes service discovery such as the duration of a watch, retry time, or strategy, as shown below:

service:
  client:
    retry_interval: 30
  watch:
    duration: 60
    retry_strategy: exponential_backoff

In this configuration, the duration of a watch is set to 60 seconds, and the retry strategy is set to exponential backoff. The retry interval is set to 30 seconds, which means that the client will retry connecting to the service after 30 seconds if the initial connection attempt fails.

apache / apisix

feat: As a user, I want kubernetes service discovery to support more configuration items #8311

Description