In order to prevent cascading failures and crash loops during server and broker pods restarts, we need more flexibility to customize probes in Kubernetes-
We need the following additional options for every probe in each statefulset.
successThreshold
failureThreshold
timeoutSeconds
In addition, we are using readiness and liveness probes but not startup probes. The server pods need an initial startup time where they mark the segments online. This startup time is different for different cluster depending on the number of segments stored. Instead of a static startup delay using initialDelaySeconds in readiness and liveness probes, it makes sense to use startup probe instead. The startup probe will dynamically check every periodSeconds for failureThreshold times and will mark the server pod healthy as soon as it is available.
In order to prevent cascading failures and crash loops during server and broker pods restarts, we need more flexibility to customize probes in Kubernetes-
We need the following additional options for every probe in each statefulset.
successThreshold
failureThreshold
timeoutSeconds
In addition, we are using readiness and liveness probes but not startup probes. The server pods need an initial startup time where they mark the segments online. This startup time is different for different cluster depending on the number of segments stored. Instead of a static startup delay using
initialDelaySeconds
in readiness and liveness probes, it makes sense to use startup probe instead. The startup probe will dynamically check everyperiodSeconds
forfailureThreshold
times and will mark the server pod healthy as soon as it is available.If we add startup probe, I would also appreciate if we have the possibility of setting different
initialDelaySeconds
andperiodSeconds
for each probe and are not forced to use the same values for every probe. https://github.com/apache/pinot/blob/b76653e1acbe82c6e6b09655c7cb5ab20bbea4c1/helm/pinot/templates/server/statefulset.yaml#L88