[bitnami/postgresql-ha] Apparent race condition in replica startup after scaling up GKE cluster from zero nodes

SeanZicari commented 4 years ago

Which chart: 3.4.3

Describe the bug After running a GKE cluster for a while in which postgresql-ha was a subchart to support a Django application, I scaled the cluster down for a few days. After scaling back up, there were issues logging into the Django site...I kept immediately being logged out with no error message. I suspect maybe the session information wasn't being written to the database and so Django kept "forgetting" I was logged in. It seems like the scale-up didn't bring up all the services correctly. I tried scaling down and then up once more (it was right before I was supposed to use the Django site for a presentation) and at that point postgresql-ha would not come online.

To Reproduce Steps to reproduce the behavior:

Install postgresql-ha 3.4.3 in GKE as a sub chart in a Helm chart, with a replica count of 2 (the default)
Ensure to set a custom password and repmgr password
Scale the GKE cluster down to zero
Scale the GKE cluster back up
You may need to do this multiple times until the error is reproduced

Expected behavior postgresql-ha would come back online correctly without this apparent race condition that now causes the first replica to wait for the second replica in a StatefulSet which can't come online until the first replica is started up.

Version of Helm and Kubernetes:

Output of helm version:

version.BuildInfo{Version:"v3.2.4", GitCommit:"0ad800ef43d3b826f31a5ad8dfbb4fe05d143688", GitTreeState:"dirty", GoVersion:"go1.14.3"}

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.11", GitCommit:"d94a81c724ea8e1ccc9002d89b7fe81d58f89ede", GitTreeState:"clean", BuildDate:"2020-03-12T21:08:59Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.12-gke.2", GitCommit:"fb7add51f767aae42655d39972210dc1c5dbd4b3", GitTreeState:"clean", BuildDate:"2020-06-01T22:20:10Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}

Additional context Here is the log output from postgresql-ha-postgresql-0 when it started up and failed because it was waiting for postgresql-ha-postgresql-1:

postgresql.log

SeanZicari commented 4 years ago

I tried upgrading to chart version 3.5.4 but the issue was still there. Maybe I just don't know how to reset the installation properly without losing data?

SeanZicari commented 4 years ago

I just learned about the Parallel Pod Management policy. Maybe that would be a better option than the default OrderedReady policy? Seems like it would prevent the problem I ran into from happening, because all replicas would come up at the same time and be able to find each other.

joancafom commented 4 years ago

Hi @SeanZicari !

As this issue seems to be related with the postgresql-ha chart itself, I have tried reproducing it there. I am also using a GKE as you specified, but unfortunately I have been unable to face this problem. I have put the cluster to several scaling rounds and it seems to be working for me. Maybe I am not reproducing the steps correctly, here is my workflow:

1- Set a custom password for both potgresql and repmgr

password: mypassword
repmgrPassword: mypassword

2- Create a brand new release with these values

$ helm install kappa bitnami/postgresql-ha
NAME: kappa
LAST DEPLOYED: Tue Sep  8 10:52:24 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
...

$ kubectl get pods
NAME                                          READY   STATUS    RESTARTS   AGE
kappa-postgresql-ha-pgpool-79449bf9b6-fzx9k   1/1     Running   0          7m13s
kappa-postgresql-ha-postgresql-0              1/1     Running   0          7m13s
kappa-postgresql-ha-postgresql-1              1/1     Running   0          6m39s

2- Upgrade the replicaCount values for both postgresql and pgpool to zero:

pgpool:
  replicaCount: 0
postgresql:
  replicaCount: 0

3- Perform an upgrade

$ helm upgrade kappa bitnami/postgresql-ha
Release "kappa" has been upgraded. Happy Helming!
NAME: kappa
LAST DEPLOYED: Tue Sep  8 11:00:16 2020
NAMESPACE: default
STATUS: deployed
REVISION: 2
TEST SUITE: None
...

$ kubectl get pods
No resources found in default namespace.

4 - Restore replicaCount values and perform an upgrade

$ helm upgrade kappa bitnami/postgresql-ha
Release "kappa" has been upgraded. Happy Helming!
NAME: kappa
LAST DEPLOYED: Tue Sep  8 11:02:04 2020
NAMESPACE: default
STATUS: deployed
REVISION: 3
...

$ kubectl get pods
NAME                                          READY   STATUS    RESTARTS   AGE
kappa-postgresql-ha-pgpool-79449bf9b6-jqjvn   1/1     Running   0          92s
kappa-postgresql-ha-postgresql-0              1/1     Running   0          91s
kappa-postgresql-ha-postgresql-1              1/1     Running   0          66s

I have done steps 2-4 three times and in every occasion pods were able to start up normally.

Thanks!

SeanZicari commented 4 years ago

I appreciate you trying to reproduce the problem! There are a couple things that are different about my situation. I am not changing the replicaCount; I scaled the entire cluster up and down using GKE's node scaling. I probably did that at least 6 or so times before the problem occurred.

SeanZicari commented 4 years ago

You also may need to put some data into the database. I don't know if the problem will occur without actual data to replicate.

joancafom commented 4 years ago

Hi @SeanZicari

I am not very familiar with the scaling capabilities of the GKE cluster. Are you using the autoscaler option? Or are you deleting the nodes from the cluster and then adding them back?

In any case, would you mind trying scaling the cluster using the provided parameters and helm? I don't really know how GKE performs the scaling operation, but this could be related to this https://github.com/bitnami/charts/issues/3431#issuecomment-674836237 https://github.com/bitnami/charts/issues/3431#issuecomment-679947772

Regrads

jp-gouin commented 4 years ago

Hi @joancafom , An easy way to reproduce the issue is to deploy the chart.

Kill the pod of postgresql-0 since it’s the primary node.
The postgresql-1 node will become the new Primary
Then kill both pods

Since it’s a statefullset the postgresql-1 won’t start until the postgresql-0 is running

However the postgresql-0 node won’t start because it’s not the primary node

I agree with @SeanZicari , the solution migth be with the parallel pod management but we need to make sure that there is no side effect.

SeanZicari commented 4 years ago

Is it possible that setting a higher postgresql.repmgrConnectTimeout would allow the first pod to stay up long enough for Kubernetes to bring up the second pod, or is the pod not considered healthy until the repmgr has fully started up? I didn’t think about increasing the connect timeout before.

SeanZicari commented 4 years ago

Though I do think parallel pod management more closely matches traditional deployments wherein both instances are available at the same time.

miguelaeh commented 4 years ago

Hi all, So I understand the issue is more related to how the pods are created than with the scalation itself right? I will test the scenario @jp-gouin is commenting with and without the Pod Parallel Management to check if that solves the issue and if that solution has any side effect. Thank you for the suggestions guys!

miguelaeh commented 4 years ago

Hi guys! We have tested changes suggested by @jp-gouin and it seems that the cluster is properly recovered. You can see the test steps in the PR above. Regards.

stale[bot] commented 4 years ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

miguelaeh commented 4 years ago

We are still looking into the issue. More info at the PR #3681

SeanZicari commented 4 years ago

@miguelaeh Awesome to see you taking action on this. Thanks again for that!

rafariossaa commented 4 years ago

Hi @SeanZicari , Could you check if the new chart version fixes this issue for you ?.

bitnami / charts

[bitnami/postgresql-ha] Apparent race condition in replica startup after scaling up GKE cluster from zero nodes #3603