Closed DorianGray closed 8 years ago
FYI, this is still happening with 1.3, same symptoms. Restarting the process solves it immediately. It seems like either rolling restarting backend servers and removing/re-adding them one by one might cause it...either that or deregistering backends.
time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event=
This happened while doing kubectl rolling-update on a replication controller who's associated service did -not- match the romulus selector.
So... https://github.com/DorianGray/romulus/commit/eabda2444597b7d4a0c72f339e2d57148fafc483 This fixes the issue, but not in the right way... It just kills the process when an unknown message comes in. I'm new to go but I found that somehow the event channel gets closed which causes the infinite loop of empty objects... I'm not sure how to fix it right, I'd imagine it will take a bit of refactoring of how the event channel is managed.
Hmmm, ok that's odd. Will need to figure out why the channel is closing or we're getting empty events.
Alright, so it looks like this is the watch getting closed due to some error. Need to figure a graceful way of keeping the channel alive.
This should be working after the recent rewrite.
Every once in awhile, I see romulusd pinning a core at 100%. With debug logging enabled, it looks like it's in a loop getting empty messages from kubernetes.