High CPU usage on random hosts.

DorianGray commented 8 years ago

Every once in awhile, I see romulusd pinning a core at 100%. With debug logging enabled, it looks like it's in a loop getting empty messages from kubernetes.

DorianGray commented 8 years ago

FYI, this is still happening with 1.3, same symptoms. Restarting the process solves it immediately. It seems like either rolling restarting backend servers and removing/re-adding them one by one might cause it...either that or deregistering backends.

DorianGray commented 8 years ago

time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event= time="2015-08-23T06:59:29Z" level=debug msg="Unsupported event type" event= time="2015-08-23T06:59:29Z" level=debug msg="Got a kubernetes API event" event=

DorianGray commented 8 years ago

This happened while doing kubectl rolling-update on a replication controller who's associated service did -not- match the romulus selector.

DorianGray commented 8 years ago

So... https://github.com/DorianGray/romulus/commit/eabda2444597b7d4a0c72f339e2d57148fafc483 This fixes the issue, but not in the right way... It just kills the process when an unknown message comes in. I'm new to go but I found that somehow the event channel gets closed which causes the infinite loop of empty objects... I'm not sure how to fix it right, I'd imagine it will take a bit of refactoring of how the event channel is managed.

albertrdixon commented 8 years ago

Hmmm, ok that's odd. Will need to figure out why the channel is closing or we're getting empty events.

albertrdixon commented 8 years ago

Alright, so it looks like this is the watch getting closed due to some error. Need to figure a graceful way of keeping the channel alive.

albertrdixon commented 8 years ago

This should be working after the recent rewrite.

albertrdixon / romulus

High CPU usage on random hosts. #3