Closed robinmonjo closed 7 years ago
Linking this issue: https://github.com/deis/router/issues/180 but pretty hard to reproduce as it happens randomly.
I've got more information about this subject. I scaled deis router to have 2 pods. From within the 2 deis router pods, I started a simple loop that curl
my service and output the status code every ten seconds.
For about 1 hour, everything worked fine, only the expected 302 status code was output. Then I launched a deployment I started to see some failures and managed to catch one:
$> curl 100.65.135.200 -vv
* Rebuilt URL to: 100.65.135.200/
* Trying 100.65.135.200...
* connect to 100.65.135.200 port 80 failed: No route to host
* Failed to connect to 100.65.135.200 port 80: No route to host
* Closing connection 0
curl: (7) Failed to connect to 100.65.135.200 port 80: No route to host
So good news, it doesn't seem to come from the deis router. It looks like my kubernetes service failed sometimes, and more frequently when endpoints changed recently. I have a liveness and a readiness probs set on my app web processes.
Does that sounds to be a reasonable conclusion for you ? My cluster is a k8s 1.5.2 cluster setup with kops, running on AWS and using the weave network plugin.
I don't know much about weave, but your conclusion seems reasonable. This sounds like a problem upstream from Nginx, so the overlay network, kube-proxy, or some bug in deployments are all possibilities.
Ok thank you. What do you recommend to use ? I have had good success with flannel
not that good with weave
(hence the issue :) ). I don't really want to use the "not software based" networking as I don't really want to have to worry about routing in my VPC route table...
I don't want to be too quick to pin the problem on weave, but since you asked... most of my clusters have used kube-aws from CoreOS. That uses flannel by default and I've never personally witnessed this problem.
Also, more recently, I have created clusters using kops and that uses kubenet by default.
I used to use kube-aws as well but had had a terrible experience lately since they introduced node pools. Tried kops and really loved it. I'll try another overlay network and see if my problem happens again.
Robin,
Were did you end up with this. We noticed 502 within our cluster as well running weave.
We ended up using Flannel. And problem solved. These 502 issue at the time were because docker crashed a lot and the weave docker container was unable to reboot properly.
I’m surprised to hear someone still have this issue. Is your k8s cluster up to date ? This is 100% related to kubernetes services. It says that all endpoints are available but this is not the case.
We are running kube 1.7.4 and weave 2.1.3. We notice times where thing start dropping.
But is not isolated to a single node.
What would you recommend as next steps? Would one weave pod crash really cause 502 originating on different deis pods on different nodes?
Hello all,
Using deis v2.11.0 I experience random 502 error on the Deis router. I deployed an app and scale it up to 5 webs.
So I've got this service:
and these endpoints:
Everything looks good, nginx on deis router is properly configured to send request on my kubernetes service:
I can also properly curl each endpoints from within the router pod.
However I get random and regularly 502 bad gateway when accessing my app (it works most of the time but 20% of my requests got a 502). Here are some logs of the router:
I have no idea how to debug this further ...
Regards, Robin