Open jbuns opened 3 years ago
Tested also on AKS and seeing the same problem:
$ k logs loggregator-bridge-59f5cb64bc-9scbb -n kubecf
{"level":"info","ts":1615563276.4147344,"caller":"kubeconfig/getter.go:53","msg":"Using in-cluster kube config"}
{"level":"info","ts":1615563276.414798,"caller":"kubeconfig/checker.go:36","msg":"Checking kube config"}
Received non-pod object in watcher channel
Error: unexpected EOF
@mudler Any ideas what this might be / where to look next?
I've turned on DEBUG
logging for loggregator-bridge and this is the error I'm seeing:
Starting Loggregator
{"level":"info","ts":1615410279.0113866,"caller":"kubeconfig/getter.go:53","msg":"Using in-cluster kube config"}
{"level":"info","ts":1615410279.0114636,"caller":"kubeconfig/checker.go:36","msg":"Checking kube config"}
Received event: {ERROR &Status{ListMeta:ListMeta{SelfLink:,ResourceVersion:,Continue:,RemainingItemCount:nil,},Status:Failure,Message:too old resource version: 43522014 (43524698),Reason:Expired,Details:nil,Code:410,}}
Received non-pod object in watcher channel
In the code, I can see that the failure is happening here: https://github.com/cloudfoundry-incubator/eirini-loggregator-bridge/blob/master/podwatcher/podwatcher.go#L293-L306
@mudler / @jandubois any suggestions on how we can try to fix this?
@jbuns Sorry, I know nothing about the eirini-loggregator-bridge, and have no time to learn about it.
Let's see if @mudler can give you hints next week; this week has been Hackweek at SUSE, so everyone has been working on other stuff... (FWIW, I spend half a day of my hackweek time yesterday on getting Eirini-1.8 to continue to work with the latest cf-deployment, so we don't have to drop it (yet) for the kubecf-2.8 releases).
It looks like we are receiving old events in the channel - this reminds me the work done in EiriniX https://github.com/cloudfoundry-incubator/eirinix/pull/38 - is the loggregator-bridge using latest EiriniX including that fix? Otherwise, the alternative is specifying manually a ResourceVersion to start watch on.
From the error message, it looks the watcher is starting to listen on events which are old and not there anymore - while the above PR was meant to fetch the latest ResourceVersion during start to fix exactly that issue
@mudler loggregator-bridge is using eirinix v0.3.1 https://github.com/cloudfoundry-incubator/eirini-loggregator-bridge/blob/master/go.mod#L4
so I'm assuming that it's got the fix you've mentioned since https://github.com/cloudfoundry-incubator/eirinix/pull/38 was merged since v0.2.0: https://github.com/cloudfoundry-incubator/eirinix/compare/v0.2.0...master
Does that mean that the manager in eirinix is the one that's failing? Only difference I can see between the PR above and what's in the code now is this line: https://github.com/cloudfoundry-incubator/eirinix/blob/master/manager.go#L298
The status Message:too old resource version
seems to be an expected behaviour according to kubernetes:
https://github.com/kubernetes/kubernetes/issues/22024
It looks like podwatcher
needs to be updated in order to handle this, rather than erroring out.
@mudler any preference on how I should fix this or should I just come up with the fix and it can be reviewed in a PR?
Describe the bug We’re currently facing issues with loggregator-bridge. When doing
cf logs
logs of typeAPP
andSTG
fail to show up.To Reproduce We've seen failing in two different scenarios.
scenario 1: we’ve got a long-running deployment of kubecf with eirini and noticed that after a while,
APP
andSTG
logs stop appearing during cf logs. I’ve traced it down to loggregator-bridge. The pod logs looks like:scenario 2: after a fresh installation of kubecf+eirini on OpenShift 4.6 (k8s version 1.19), the cf logs fail to appear and the problem is the exact same as above.
Expected behavior When doing
cf logs
I should be able to also seeAPP
andSTG
logs.Environment KubeCF version: 2.7.12 Eirini version: 1.8 Kubernetes: 1.19
Additional context This was tested on OpenShift 4.4 and 4.6