coryodaniel / bonny

The Elixir based Kubernetes Development Framework
MIT License
377 stars 27 forks source link

LeaderElector error on leader change #251

Open kbredemeier opened 8 months ago

kbredemeier commented 8 months ago

Environment

Current behavior

I'm occasionally seeing the following error:

** (CaseClauseError) no case clause matching: {:error, %K8s.Client.APIError{message: "etcdserver: leader changed", reason: "Failure"}}

Backtrace:
lib/bonny/operator/leader_elector.ex:200 Bonny.Operator.LeaderElector.acquire_or_renew/2    
lib/bonny/operator/leader_elector.ex:86 Bonny.Operator.LeaderElector.handle_info/2  
gen_server.erl:1077 :gen_server.try_handle_info/3   
gen_server.erl:1165 :gen_server.handle_msg/6    
proc_lib.erl:241 :proc_lib.init_p_do_apply/3

I'm not sure what's causing this issue for me but google indicates that this might be caused by some snapshot running every 2h on AKS clusters resulting in this leader change.

Expected behavior

Not sure, maybe this should just log a warning :shrug:

mruoss commented 8 months ago

Hmm... this is new to me. Does the operator recover afterwards?

kbredemeier commented 7 months ago

Hmm... this is new to me. Does the operator recover afterwards?

I'm not sure yet. I was not able to inspect this before someone restarted the pod, just found it in our error reporting tool. But I do run into an issue where the operator stops receiving add events, but I'm not sure this is related. Still investigating... this happens very rarely.