Open wangjia184 opened 3 years ago
Tried to set StatefulSet's replica
to 3, all PODs crashed on startup. Even with empty consul, no key/values pre-exists before starting the PODs in StatefulSet
fail: Orleans.Runtime.MembershipService.MembershipAgent[100661]
Failed to get ping responses from 1 of 1 active silos. Newly joining silos validate connectivity with all active silos that have recently updated their 'I Am Alive' value before joining the cluster. Successfully contacted: []. Silos which did not respond successfully are: [S10.18.123.235:11111:361177868]. Will continue attempting to validate connectivity until 06/12/2021 07:19:33. Attempt #7
After PODs restarted over and over again, finally they stablize down and all start up. Please see RESTARTS column below.
NAME READY STATUS RESTARTS AGE
ubs-job-dev-0 1/1 Running 4 17m
ubs-job-dev-1 1/1 Running 4 16m
ubs-job-dev-2 1/1 Running 3 16m
Log says 7 silos.
ProcessTableUpdate (called from TryUpdateMyStatusGlobalOnce) membership table: 7 silos, 3 are Active, 4 are Dead, Version=<33, 31015>. All silos: [SiloAddress=S10.18.123.246:11111:361178481 SiloName=ubs-job-dev-0 Status=Active, SiloAddress=S10.18.123.199:11111:361178519 SiloName=ubs-job-dev-1 Status=Active, SiloAddress=S10.18.117.114:11111:361178416 SiloName=ubs-job-dev-2 Status=Active, SiloAddress=S10.18.117.114:11111:361178292 SiloName=ubs-job-dev-2 Status=Dead, SiloAddress=S10.18.123.199:11111:361178366 SiloName=ubs-job-dev-1 Status=Dead, SiloAddress=S10.18.123.235:11111:361177868 SiloName=ubs-job-dev-0 Status=Dead, SiloAddress=S10.18.123.246:11111:361178329 SiloName=ubs-job-dev-0 Status=Dead]
And this is how it looks in Consul:
There are only 3 PODs in this StatefulSet while log says 7 silos. The SiloName
is the pod name, unlike ReplicaSet, pod name in StatefulSet does not change after POD restart, It seems POD cannot see others on startup, then it crashes. StatefulSet restarted the crashed POD, the newly-started POD with the same pod name is seen as a new Silo.
Are you using K8s membership via UseKubeMembership()
extension method? Looks like in the examples above you are only using official Orleans libraries such as Microsoft.Orleans.OrleansConsulUtils
and Microsoft.Orleans.Hosting.Kubernetes
. If so you need to report this issue to the official Orleans project i.e. https://github.com/dotnet/orleans
Version 3.4.3.
I configured the labels and environment variables for my POD accordingly to the doc.
Running Orleans in K8S StatefulSet, my CI tool deploys the K8S StatefulSet, and then it crashes on startup.