albertrdixon / romulus

A kubernetes ingress controller
MIT License
103 stars 12 forks source link

romulus seems to stop working after a while #25

Closed errm closed 8 years ago

errm commented 8 years ago

After running some time, 15mins - Hours romulus seems to stop updating when pods are added or removed...

This seems to be related to these sorts of error messages I am seeing in the logs.

ERROR: logging before flag.Parse: W0316 13:28:07.305987 1 reflector.go:288] /go/src/github.com/timelinelabs/romulus/kubernetes/kubernetes.go:124: watch of *extensions.Ingress ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [519649/519051]) [520648] 
ERROR: logging before flag.Parse: W0316 13:12:25.288651 1 reflector.go:288] /usr/local/go/src/runtime/asm_amd64.s:1696: watch of *api.Service ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [519201/519038]) [520200] 
ERROR: logging before flag.Parse: W0316 12:46:54.303191 1 reflector.go:288] /go/src/github.com/timelinelabs/romulus/kubernetes/kubernetes.go:124: watch of *api.Service ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [518522/518108]) [519521] 
ERROR: logging before flag.Parse: W0316 12:43:59.204648 1 reflector.go:288] /usr/local/go/src/runtime/asm_amd64.s:1696: watch of *extensions.Ingress ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [518446/518018]) [519445] 
ERROR: logging before flag.Parse: W0316 12:29:10.885142 1 reflector.go:288] /go/src/github.com/timelinelabs/romulus/kubernetes/kubernetes.go:124: watch of *extensions.Ingress ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [518051/517641]) [519050] 

I am a little unclear as to where to start digging to try and resolve this . . . @albertrdixon if you could suggest anything that might help tracking down what is going on here, it would be a great help.

Also I don't seem to be able to create an issue on https://github.com/albertrdixon/romulus, perhaps because it is a fork . . .

errm commented 8 years ago

Actually looking at kubenetes these errors are probably not an issue as the watcher should be recreated. None the less, stuff is not updating and I am not quite sure why....

albertrdixon commented 8 years ago

My guess, without looking at anything yet, is that somehow there is some issue with resource versions here (assuming the errors are related). Basically the event callbacks may not be running because it is seeing all the events as old.

errm commented 8 years ago

Ok that seems somewhat reasonable . . . as things work fine when starting romulus from cold.

I am not sure though because Reflector.RunUntil should be restarting from fresh anyhow when it hits this... https://godoc.org/k8s.io/kubernetes/pkg/client/cache#Reflector.RunUntil

errm commented 8 years ago

I have put my investigation into a gist, so hopefully it will be somewhat reproducible...

https://gist.github.com/errm/7fe0ad3423ca95f5dad3

errm commented 8 years ago

Right, so I missed the selector labels from my service . . . my bad

Fixing that, fixed all the issues, I think the 401 errors were unrelated.

Having said that, it's surprising that it works at all . . . It might be better if it didn't and logged something drastic sounding.

anyhow a stupid RTFM moment.