r7vme commented 6 years ago

There was issue when guest cluster API was not available and controller(k8s client) was waiting for response for few minutes. To avoid this we need to set aggressive http timeout e.g. 30 sec.

https://github.com/kubernetes/client-go/blob/master/rest/config.go#L114

r7vme commented 6 years ago

tl;dr problem not in kubernetes client, but in load balancer settings.

I did some research on geckon and looks like root cause of this issue not in k8s client timeout, but in fact that we use load balancers and TCP connection just stays in ESTABLISHED state for a while (Until load balancer drops it). This reduces to zero benefits from tcp keepalive that is used by k8s.

Change that was done for this issue, is beneficial only in case slow/freezed api server and works on http level (not tcp) and does not fixes the initial issue. But i'll leave this change as it reasonable timeout for our use case.

Back to initial issue. Initial timeout was ~11 minutes on geckon. geckon uses haproxy as load balancer. And had 10 minutes for client and server timeout. This was the reason of ~11 minutes.

Fix for Haproxy case

Set quite small timeouts for regular tcp connections - 30 sec and use 1 hour for tunneled connection (e.g kubectl exec kubectl logs)

    timeout client          30s
    timeout client-fin      30s
    timeout tunnel          1h
    timeout server          30s

r7vme commented 6 years ago

Reopening until released.

giantswarm / kvm-operator-node-controller

Set explicit k8s client timeout for requests #5

Fix for Haproxy case