Closed fgsalomon closed 4 years ago
If I use the default values (just setting the node selectors) with:
helm install mongodb bitnami/mongodb-sharded --namespace mongodb --set shardsvr.dataNode.nodeSelector.cloud\\.google\\.com/gke-nodepool=pool-mongodb,configsvr.nodeSelector.cloud\\.google\\.com/gke-nodepool=pool-mongodb,mongos.nodeSelector.cloud\\.google\\.com/gke-nodepool=pool-mongodb,shardsvr.arbiter.nodeSelector.cloud\\.google\\.com/gke-nodepool=pool-mongodb
I get the same results.
Hi, are you able to reproduce the issue without setting the nodeSelector
? Just helm install mongodb bitnami/mongodb-sharded
.
With your current deployment, what is the result of kubectl get pods --namespace mongodb
, it seems like a "networking" issue because some nodes (or the application itself) are not reachable or producing a connection error.
Hi, are you able to reproduce the issue without setting the
nodeSelector
? Justhelm install mongodb bitnami/mongodb-sharded
.With your current deployment, what is the result of
kubectl get pods --namespace mongodb
, it seems like a "networking" issue because some nodes (or the application itself) are not reachable or producing a connection error.
Without setting the nodeSelector
it works:
kubectl get pods --namespace mongodbsharded
NAME READY STATUS RESTARTS AGE
mongodbsharded-mongodb-sharded-configsvr-0 1/1 Running 0 4m59s
mongodbsharded-mongodb-sharded-mongos-79cfc64446-xjl9s 1/1 Running 0 4m59s
mongodbsharded-mongodb-sharded-shard0-data-0 1/1 Running 0 4m59s
mongodbsharded-mongodb-sharded-shard1-data-0 1/1 Running 0 4m59s
It's weird, can you double-check if there is any issue with the nodeSelector
syntax or the way you're setting it?
You can use kubectl get nodes --show-labels
and then compare it with the following outputs:
$ kubectl describe pod mongodbsharded-mongodb-sharded-configsvr-0 | grep Node-Selectors
(replacing the pod name for the different ones)
helm install
command by helm template
filtering the result:
helm template mongodb bitnami/mongodb-sharded --set shardsvr.dataNode.nodeSelector.cloud\\.google\\.com/gke-nodepool=pool-mongodb,configsvr.nodeSelector.cloud\\.google\\.com/gke-nodepool=pool-mongodb,mongos.nodeSelector.cloud\\.google\\.com/gke-nodepool=pool-mongodb,shardsvr.arbiter.nodeSelector.cloud\\.google\\.com/gke-nodepool=pool-mongodb | grep -A 4 nodeSelector
Although I don't think it is an issue with the label itself if not, the pods shouldn't be in a ready state as they shouldn't be assigned to any node.
Can you connect to any pod and manually execute the readiness command (mongo --eval "db.adminCommand('ping')"
)? Is the above kubectl get pods
output from a deployment with or without nodeSelector
?
Is the above
kubectl get pods
output from a deployment with or withoutnodeSelector
?
The above was without setting nodeSelector
.
If delete the chart and install it again setting nodeSelector
running kubectl get pods
returns:
NAME READY STATUS RESTARTS AGE
mongodbsharded-mongodb-sharded-configsvr-0 1/1 Running 0 6m50s
mongodbsharded-mongodb-sharded-mongos-7f56876864-8wn47 0/1 Running 2 6m50s
mongodbsharded-mongodb-sharded-shard0-data-0 0/1 Running 2 6m50s
mongodbsharded-mongodb-sharded-shard1-data-0 0/1 Running 2 6m50s
Only the config server is running, the other pods keep crashing.
Can you connect to any pod and manually execute the readiness command (mongo --eval "db.adminCommand('ping')")?
The config server returns this:
$ mongo --eval "db.adminCommand('ping')"
MongoDB shell version v4.4.1
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("13cda720-ab2d-48b8-9b20-bdd771586608") }
MongoDB server version: 4.4.1
{
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("7fffffff0000000000000004")
},
"lastCommittedOpTime" : Timestamp(1599826672, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1599826672, 1),
"signature" : {
"hash" : BinData(0,"JWicsLnKm3kg1ipTTP0myzhkxB0="),
"keyId" : NumberLong("6871092725999992854")
}
},
"operationTime" : Timestamp(1599826672, 1)
}
The shard and mongos pods both return:
$ mongo --eval "db.adminCommand('ping')"
MongoDB shell version v4.4.1
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed: SocketException: Error connecting to 127.0.0.1:27017 :: caused by :: Connection refused :
connect@src/mongo/shell/mongo.js:374:17
@(connect):2:6
exception: connect failed
exiting with code 1
The labels on the nodes seem right:
gke-my-cluster-staging-pool-mongodb-59d757f0-2pmg Ready <none> 28h v1.14.10-gke.42 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/instance-type=n1-standard-1,beta.kubernetes.io/os=linux,cloud.google.com/gke-nodepool=pool-mongodb,cloud.google.com/gke-os-distribution=cos,failure-domain.beta.kubernetes.io/region=europe-west1,failure-domain.beta.kubernetes.io/zone=europe-west1-b,kubernetes.io/arch=amd64,kubernetes.io/hostname=gke-my-cluster-staging-pool-mongodb-59d757f0-2pmg,kubernetes.io/os=linux
The node selectors on the pods also seem correct:
kubectl -n mongodbsharded describe pod mongodbsharded-mongodb-sharded-shard1-data-0 | grep Node-Selectors
Node-Selectors: cloud.google.com/gke-nodepool=pool-mongodb
kubectl -n mongodbsharded describe pod mongodbsharded-mongodb-sharded-shard0-data-0 | grep Node-Selectors
Node-Selectors: cloud.google.com/gke-nodepool=pool-mongodb
kubectl -n mongodbsharded describe pod mongodbsharded-mongodb-sharded-mongos-7f56876864-8wn47 | grep Node-Selectors
Node-Selectors: cloud.google.com/gke-nodepool=pool-mongodb
kubectl -n mongodbsharded describe pod mongodbsharded-mongodb-sharded-configsvr-0 | grep Node-Selectors
Node-Selectors: cloud.google.com/gke-nodepool=pool-mongodb
The output of the helm template ...
command also matches:
nodeSelector:
cloud.google.com/gke-nodepool: pool-mongodb
affinity:
{}
tolerations:
--
nodeSelector:
cloud.google.com/gke-nodepool: pool-mongodb
affinity:
{}
tolerations:
--
nodeSelector:
cloud.google.com/gke-nodepool: pool-mongodb
affinity:
{}
tolerations:
--
nodeSelector:
cloud.google.com/gke-nodepool: pool-mongodb
affinity:
{}
tolerations:
It's weird as per your outputs, the nodeSelector
is set in the same way in all the pods. In the same way, the ConfigServer is fully operational, the status is READY and the liveness probe works manually.
On the other hand, the rest of the pods are being restarted even with the same nodeSelector
, probably the issue is the same in all of them, so let's pick up one and run some commands:
kubectl describe pod PODNAME
kubectl logs -f PODNAME
where PODNAME
is the name of one of the failing ones, for example mongodbsharded-mongodb-sharded-shard0-data-0
.
As the pods are in a RUNNING
state (but not ready), I guess the issue is only related to the probes because if it is something related to the label, the pods shouldn't be in a RUNNING
state but in PENDING
, then with the kubectl describe pod
command something like
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 30s (x2 over 30s) default-scheduler 0/3 nodes are available: 3 node(s) didn't match node selector.
should appear, but this is not going to be the case. We need to find why the probes are working on one pod but not in the rest, let's see if we have more information with the describe or log commands
kubectl -n mongodbsharded describe pod mongodbsharded-mongodb-sharded-shard0-data-0
kubectl -n mongodbsharded logs mongodbsharded-mongodb-sharded-shard0-data-0
outputs:
07:04:45.87 INFO ==> Setting node as primary
mongodb 07:04:45.90
mongodb 07:04:45.90 Welcome to the Bitnami mongodb-sharded container
mongodb 07:04:45.90 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-mongodb-sharded
mongodb 07:04:45.90 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-mongodb-sharded/issues
mongodb 07:04:45.90
mongodb 07:04:45.90 INFO ==> ** Starting MongoDB Sharded setup **
mongodb 07:04:45.93 INFO ==> Validating settings in MONGODB_* env vars...
mongodb 07:04:45.94 INFO ==> Initializing MongoDB Sharded...
mongodb 07:04:45.96 INFO ==> Writing keyfile for replica set authentication...
mongodb 07:04:45.97 INFO ==> Enabling authentication...
mongodb 07:04:45.97 INFO ==> Deploying MongoDB Sharded with persisted data...
mongodb 07:04:45.99 INFO ==> Trying to connect to MongoDB server mongodbsharded-mongodb-sharded...
timeout reached before the port went into state "inuse"
I've done some tests these past days (installing/deleting the chart, increasing/decreasing the number of nodes, etc) and I can no longer get mongodb working even without setting the node selectors. So it seems that wasn't the issue. Anyway, the above outputs are from a installed chart setting the node selector since I need that for the production cluster.
Ok, it is clear that the rest of the pods are failing because of some kind of unreachability. Although everything seems properly configured in terms of NodeSelector, I am trying to reproduce the issue but without luck.
I installed the chart following two different approaches:
nodeSelector
I installed the chart with a label for one of my 3 nodes:
helm install mongodb bitnami/mongodb-sharded \
--set shardsvr.dataNode.nodeSelector.kubernetes\\.io/hostname=gke-carlos-cluster-default-pool-fa254d68-v9x6 \
--set configsvr.nodeSelector.kubernetes\\.io/hostname=gke-carlos-cluster-default-pool-fa254d68-v9x6 \
--set mongos.nodeSelector.kubernetes\\.io/hostname=gke-carlos-cluster-default-pool-fa254d68-v9x6 \
--set shardsvr.arbiter.nodeSelector.kubernetes\\.io/hostname=gke-carlos-cluster-default-pool-fa254d68-v9x6
Then by running kubectl describe pod POD_NAME | grep 'Node'
in all the pods I was able to see that the assigned node (Node: ...) and the desired node (Node-Selectors: ...) is the same, so the pods are assigned to the same desired node:
kubectl describe pod mongodb-mongodb-sharded-configsvr-0 | grep 'Node' && \
kubectl describe pod mongodb-mongodb-sharded-mongos-7f965f859-chs2q | grep 'Node' && \
kubectl describe pod mongodb-mongodb-sharded-shard0-data-0 | grep 'Node' && \
kubectl describe pod mongodb-mongodb-sharded-shard1-data-0 | grep 'Node' && \
kubectl describe pod mongodb2-mongodb-sharded-configsvr-0 | grep 'Node'
Node: gke-carlos-cluster-default-pool-fa254d68-v9x6/10.142.0.44
Node-Selectors: kubernetes.io/hostname=gke-carlos-cluster-default-pool-fa254d68-v9x6
Node: gke-carlos-cluster-default-pool-fa254d68-v9x6/10.142.0.44
Node-Selectors: kubernetes.io/hostname=gke-carlos-cluster-default-pool-fa254d68-v9x6
Node: gke-carlos-cluster-default-pool-fa254d68-v9x6/10.142.0.44
Node-Selectors: kubernetes.io/hostname=gke-carlos-cluster-default-pool-fa254d68-v9x6
Node: gke-carlos-cluster-default-pool-fa254d68-v9x6/10.142.0.44
Node-Selectors: kubernetes.io/hostname=gke-carlos-cluster-default-pool-fa254d68-v9x6
In theory, this is the scenario you are looking for, but in this case, everything is up and running:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
mongodb-mongodb-sharded-configsvr-0 1/1 Running 0 10m
mongodb-mongodb-sharded-mongos-7f965f859-chs2q 1/1 Running 0 10m
mongodb-mongodb-sharded-shard0-data-0 1/1 Running 0 10m
mongodb-mongodb-sharded-shard1-data-0 1/1 Running 0 10m
nodeSelector
Now I installed the chart but without setting any nodeSelector
$ helm install mongodb2 bitnami/mongodb-sharded
As expected, each pod can be assigned (or not) to different nodes:
kubectl describe pod mongodb2-mongodb-sharded-configsvr-0 | grep 'Node' && \
kubectl describe pod mongodb2-mongodb-sharded-mongos-c8cc5f68b-w6hsv | grep 'Node' && \
kubectl describe pod mongodb2-mongodb-sharded-shard0-data-0 | grep 'Node' && \
kubectl describe pod mongodb2-mongodb-sharded-shard1-data-0 | grep 'Node'
Node: gke-carlos-cluster-default-pool-fa254d68-v9x6/10.142.0.44
Node-Selectors: <none>
Node: gke-carlos-cluster-default-pool-fa254d68-9se9/10.142.0.46
Node-Selectors: <none>
Node: gke-carlos-cluster-default-pool-fa254d68-v9x6/10.142.0.44
Node-Selectors: <none>
Node: gke-carlos-cluster-default-pool-fa254d68-ak6i/10.142.0.47
Node-Selectors: <none>
in this case, Node-Selectors is empty as I didn't specify anything so each pod is assigned to different nodes.
Also in this case everything is up and running:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
mongodb2-mongodb-sharded-configsvr-0 1/1 Running 0 17m
mongodb2-mongodb-sharded-mongos-c8cc5f68b-w6hsv 1/1 Running 0 17m
mongodb2-mongodb-sharded-shard0-data-0 1/1 Running 0 17m
mongodb2-mongodb-sharded-shard1-data-0 1/1 Running 0 17m
Ok, it is clear that the rest of the pods are failing because of some kind of unreachability. Although everything seems properly configured in terms of NodeSelector, I am trying to reproduce the issue but without luck.
How can I find what causes this unreachability? I'm new to MongoDB and don't know where to start looking. What baffles me it's that the behavior it's not deterministic. I've deployed the chart many times this morning with different combinations and a couple of times it did work. I guess that the issue has to be related to the state of my cluster but I don't see how.
Are you able to connect to your MongoDB by following the instructions that appear in the installation notes? You can see those instructions as any time by running helm get notes NAME
:
$ helm get notes mongodb
NOTES:
** Please be patient while the chart is being deployed **
The MongoDB Sharded cluster can be accessed via the Mongos instances in port 27017 on the following DNS name from within your cluster:
mongodb-mongodb-sharded.default.svc.cluster.local
To get the root password run:
export MONGODB_ROOT_PASSWORD=$(kubectl get secret --namespace default mongodb-mongodb-sharded -o jsonpath="{.data.mongodb-root-password}" | base64 --decode)
To connect to your database run the following command:
kubectl run --namespace default mongodb-mongodb-sharded-client --rm --tty -i --restart='Never' --image docker.io/bitnami/mongodb-sharded:4.4.1-debian-10-r0 --command -- mongo admin --host mongodb-mongodb-sharded
To connect to your database from outside the cluster execute the following commands:
kubectl port-forward --namespace default svc/mongodb-mongodb-sharded 27017:27017 &
mongo --host 127.0.0.1 --authenticationDatabase admin -p $MONGODB_ROOT_PASSWORD
Maybe kubectl get events
can help to see if there is something else in the cluster that avoid the chart to be fully operational.
Apart from that, I would check if you have any kind of restriction in your cluster in terms of networking; not sure if for this case it is more useful to check any GKE support page or forum as the issue seems not related to the chart itself
--authenticationDatabase admin -p $MONGODB_ROOT_PASSWORD
Yes, on the rare occasion that the chart is deployed successfully I can connect to MongoDB. I think you are right and the issue is not related to the chart itself so I will check the GKE cluster. Thank you very much for your help @carrodher !!
Which chart: mongodb-sharded-2.1.1
Describe the bug Pods keep on crashing. It seems connections are being refused on pods.
Mongos:
On the logs of mongos:
Arbiter:
Shard:
To Reproduce
pool-mongodb
mongodb
namespacehelm repo add bitnami https://charts.bitnami.com/bitnami
values-production.yaml
so the node selector uses the above node pool:And adjust replica and shards according to ones needs.
Expected behavior MongoDB is working
Version of Helm and Kubernetes: Helm:
Kubernetes: