[bitnami/mongodb] mongosh connections (used in several probes) freeze and consuming a lot of cpu and memory ressources

rmuehlbauer commented 2 years ago

Name and Version

bitnami/mongodb 12.1.5

What steps will reproduce the bug?

Seems to be a problem since "mongo" was replaced with "mongosh" in https://github.com/bitnami/charts/pull/9916 and if you have tls.enabled: true and tls. authGenerated: true in your installation.

might be somehow related to #10104 and #10262

Are you using any custom parameters or values?

only part of values.yaml that seems to be interesting in this case

values.yaml

```yaml architecture: replicaset ## @param useStatefulSet Set to true to use a StatefulSet instead of a Deployment (only when `architecture=standalone`) ## useStatefulSet: false ## MongoDB(®) Authentication parameters ## tls: ## @param tls.enabled Enable MongoDB(®) TLS support between nodes in the cluster as well as between mongo clients and nodes ## enabled: true ## @param tls.autoGenerated Generate a custom CA and self-signed certificates ## autoGenerated: true ## @param tls.existingSecret Existing secret with TLS certificates (keys: `mongodb-ca-cert`, `mongodb-ca-key`, `client-pem`) ## NOTE: When it's set it will disable certificate creation ## existingSecret: "" ## Add Custom CA certificate ## @param tls.caCert Custom CA certificated (base64 encoded) ## @param tls.caKey CA certificate private key (base64 encoded) ## caCert: my-base64-encoded-CA-Cert caKey: my-base64-encoded-CA-Key ## Bitnami Nginx image ## @param tls.image.registry Init container TLS certs setup image registry ## @param tls.image.repository Init container TLS certs setup image repository ## @param tls.image.tag Init container TLS certs setup image tag (immutable tags are recommended) ## @param tls.image.pullPolicy Init container TLS certs setup image pull policy ## @param tls.image.pullSecrets Init container TLS certs specify docker-registry secret names as an array ## @param tls.extraDnsNames Add extra dns names to the CA, can solve x509 auth issue for pod clients ## image: registry: docker.io repository: bitnami/nginx tag: 1.21.6-debian-10-r103 pullPolicy: IfNotPresent ## Optionally specify an array of imagePullSecrets. ## Secrets must be manually created in the namespace. ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ ## e.g: ## pullSecrets: ## - myRegistryKeySecretName ## pullSecrets: [] ## e.g: ## extraDnsNames ## "DNS.6": "$my_host" ## "DNS.7": "$test" ## extraDnsNames: [] ## @param tls.mode Allows to set the tls mode which should be used when tls is enabled (options: `allowTLS`, `preferTLS`, `requireTLS`) ## mode: requireTLS ## @param hostAliases Add deployment host aliases ## https://kubernetes.io/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases/ ## hostAliases: [] ## @param replicaSetName Name of the replica set (only when `architecture=replicaset`) ## Ignored when mongodb.architecture=standalone ## replicaSetName: replica-name ## @param replicaSetHostnames Enable DNS hostnames in the replicaset config (only when `architecture=replicaset`) ## Ignored when mongodb.architecture=standalone ## Ignored when externalAccess.enabled=true ## replicaSetHostnames: true ## @param enableIPv6 Switch to enable/disable IPv6 on MongoDB(®) ## ref: https://github.com/bitnami/bitnami-docker-mongodb/blob/master/README.md#enabling/disabling-ipv6 ## ## configuration: |- extraFlags: - "--clusterAuthMode=x509" externalAccess: ## @param externalAccess.enabled Enable Kubernetes external cluster access to MongoDB(®) nodes (only for replicaset architecture) ## enabled: true ## External IPs auto-discovery configuration ## An init container is used to auto-detect LB IPs or node ports by querying the K8s API ## Note: RBAC might be required ## ## Parameters to configure K8s service(s) used to externally access MongoDB(®) ## A new service per broker will be created ## service: ## @param externalAccess.service.type Kubernetes Service type for external access. Allowed values: NodePort, LoadBalancer or ClusterIP ## type: LoadBalancer ## @param externalAccess.service.portName MongoDB(®) port name used for external access when service type is LoadBalancer ## portName: "mongodb" ## @param externalAccess.service.ports.mongodb MongoDB(®) port used for external access when service type is LoadBalancer ## ports: mongodb: 27017 ## @param externalAccess.service.loadBalancerIPs Array of load balancer IPs for MongoDB(®) nodes ## Example: ## loadBalancerIPs: ## - X.X.X.X ## - Y.Y.Y.Y ## loadBalancerIPs: ## @param externalAccess.service.loadBalancerSourceRanges Address(es) that are allowed when service is LoadBalancer - 10.0.0.1 - 10.0.0.2 - 10.0.0.3 ## ref: https://kubernetes.io/docs/tasks/access-application-cluster/configure-cloud-provider-firewall/#restrict-access-for-loadbalancer-service ## Example: ## loadBalancerSourceRanges: ## - 10.10.10.0/24 ## loadBalancerSourceRanges: [] ## @param externalAccess.service.externalTrafficPolicy MongoDB(®) service external traffic policy ## ref https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip ## externalTrafficPolicy: Local ## @param externalAccess.service.nodePorts Array of node ports used to configure MongoDB(®) advertised hostname when service type is NodePort ## Example: ## nodePorts: ## - 30001 ## - 30002 ## nodePorts: [] ## @param externalAccess.service.domain Domain or external IP used to configure MongoDB(®) advertised hostname when service type is NodePort ## If not specified, the container will try to get the kubernetes node external IP ## e.g: ## domain: mydomain.com ## domain: "" ## @param externalAccess.service.extraPorts Extra ports to expose (normally used with the `sidecar` value) ## extraPorts: [] ## @param externalAccess.service.annotations Service annotations for external access ## annotations: networking.gke.io/load-balancer-type: "Internal" ## @param externalAccess.service.sessionAffinity Control where client requests go, to the same pod or round-robin ## Values: ClientIP or None ## ref: https://kubernetes.io/docs/user-guide/services/ ## sessionAffinity: None ## @param externalAccess.service.sessionAffinityConfig Additional settings for the sessionAffinity ## sessionAffinityConfig: ## clientIP: ## timeoutSeconds: 300 ## sessionAffinityConfig: {} ```

What is the expected behavior?

mongosh connections used for several probes (readiness, startup, liveness) should not freeze, getting zombies and eating up a lot of cpu and memory.

What do you see instead?

depending on the pod runtime, there are hundreds/thousands of freezed and zombied mongosh processes. those stalled processes consume a lot cpu and memory ressources. this "top" is taken from a small dev system:

top - 13:56:51 up 4 days,  5:29,  0 users,  load average: 10.86, 13.15, 10.26
Tasks: 4775 total,   2 running,   8 sleeping,   0 stopped, 4765 zombie
%Cpu(s): 17.6 us, 12.1 sy,  0.0 ni, 69.4 id,  0.0 wa,  0.0 hi,  0.6 si,  0.3 st
MiB Mem :  16012.4 total,   1635.8 free,   8448.8 used,   5927.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   7293.3 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1539100 1001      20   0  653836 167404  59368 R   7.2   1.0   0:00.23 mongosh
1538991 1001      20   0   13592   8676   2632 R   5.6   0.1   0:01.74 top
      1 1001      20   0 4068660 538904  61272 S   5.0   3.3 279:16.82 mongod
1538606 1001      20   0 1288816 193088  68272 S   0.3   1.2   0:04.54 mongosh mongodb
1539088 1001      20   0    3864   3104   2748 S   0.3   0.0   0:00.01 readiness-probe
     82 1001      20   0       0      0      0 Z   0.0   0.0   0:02.25 mongosh mongodb
    940 1001      20   0       0      0      0 Z   0.0   0.0   0:04.69 mongosh mongodb
   1981 1001      20   0       0      0      0 Z   0.0   0.0   0:04.79 mongosh mongodb
   2030 1001      20   0       0      0      0 Z   0.0   0.0   0:02.22 mongosh mongodb
   2873 1001      20   0       0      0      0 Z   0.0   0.0   0:02.42 mongosh mongodb
   2887 1001      20   0       0      0      0 Z   0.0   0.0   0:05.15 mongosh mongodb
   2930 1001      20   0       0      0      0 Z   0.0   0.0   0:02.17 mongosh mongodb
   2968 1001      20   0       0      0      0 Z   0.0   0.0   0:02.31 mongosh mongodb
   3079 1001      20   0       0      0      0 Z   0.0   0.0   0:02.32 mongosh mongodb
   3112 1001      20   0       0      0      0 Z   0.0   0.0   0:02.20 mongosh mongodb
   3145 1001      20   0       0      0      0 Z   0.0   0.0   0:02.33 mongosh mongodb
   3379 1001      20   0       0      0      0 Z   0.0   0.0   0:02.46 mongosh mongodb
   3413 1001      20   0       0      0      0 Z   0.0   0.0   0:02.22 mongosh mongodb
   3517 1001      20   0       0      0      0 Z   0.0   0.0   0:02.23 mongosh mongodb
   3635 1001      20   0       0      0      0 Z   0.0   0.0   0:02.24 mongosh mongodb
   3650 1001      20   0       0      0      0 Z   0.0   0.0   0:04.89 mongosh mongodb

the 2 screenshot are taken from the monitoring for a small testing system - the bumps in cpu and memory happened when mongosh instead of mongo was used (May 12th). The system has basically not much to do and is idling most of the time. CPU-Usage

Memory Usage

After I've manually edited all the probes, to use mongo instead of mongosh again, all values went back to normal (May 17th)

Additional information

I've also tried to manually execute the mongosh command line that is executed for liveness checking from inside a running mongo container - the command succeeds but the connection freezes.

Not sure if this is the same issue, but there is a thread about frozen mongosh connections when using "--eval": https://www.mongodb.com/community/forums/t/mongosh-eval-freezes-the-shell/121406/4

jmConan commented 2 years ago

sure your running 12.1.5 ?

starting a replica set with your given config (ignoring the externalAccess part and setting real caKey / caCert values) yields a init:CrashLoopBackOff error with event logs of:

0s          Normal    Created                  pod/pp-mongodb-prod-0                 Created container generate-tls-certs
0s          Warning   Failed                   pod/pp-mongodb-prod-0                 Error: failed to create containerd task: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "/bitnami/scripts/generate-certs.sh": stat /bitnami/scripts/generate-certs.sh: no such file or directory: unknown

which version of kubernets is your cluster running? i tested on v1.22.5.

above error seems to be rooted in the tls.enabled: true & tls.autoGenerated: false combination. as bitnami/mongodb/templates/replicaset/statefulset.yaml states:

L:120

        {{- if .Values.tls.enabled }} required a set

L:140

            - /bitnami/scripts/generate-certs.sh

generate-certs.sh is never set due to autoGenerated being false. therefore the chart breaks.

see here: bitnami/mongodb/templates/common-scripts-cm.yaml L: 43

{{- if and .Values.tls.enabled .Values.tls.autoGenerated }}
  generate-certs.sh: |
[...]

on the other hand, i can second that the mongosh probes for everything are broken af. having problems with them ever since the change.

rmuehlbauer commented 2 years ago

yes, I'm running 12.1.5...

But I've also copied the wrong version of my values.yaml - my fault, sorry for that! Indeed I have tls.autoGenerated: true ... I've already corrected it above.

The Kubernetes version on this cluster ist 1.21.11 atm but I dont think its an issue with Kubernetes.

If you wanna try, you can also just execute the mongosh command manually from inside the container - it I try to do so, the command does something, gives me some output but never really finishes and I'm unable to quit oder exit, no matter what I try. Here, for example startup-probe.sh

startup-probe.sh: |
    #!/bin/bash
    TLS_OPTIONS='--tls --tlsCertificateKeyFile=/certs/mongodb.pem --tlsCAFile=/certs/mongodb-ca-cert'
    mongosh $TLS_OPTIONS --port $MONGODB_PORT_NUMBER --eval 'db.hello().isWritablePrimary || db.hello().secondary' | grep -q 'true'

I'm just replacing the variables with values,removing the grep part at the end and running the command from inside an mongo container:

I have no name!@mongodb-2:/$ mongosh --tls --tlsCertificateKeyFile=/certs/mongodb.pem --tlsCAFile=/certs/mongodb-ca-cert --port 27017 --eval 'db.hello().isWritablePrimary || db.hello().secondary'
Current Mongosh Log ID: 6284a39d096b04f870974249
Connecting to:      mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&tls=true&tlsCertificateKeyFile=%2Fcerts%2Fmongodb.pem&tlsCAFile=%2Fcerts%2Fmongodb-ca-cert&appName=mongosh+1.4.1
Using MongoDB:      5.0.8
Using Mongosh:      1.4.1

For mongosh info see: https://docs.mongodb.com/mongodb-shell/

To help improve our products, anonymous usage data is collected and sent to MongoDB periodically (https://www.mongodb.com/legal/privacy-policy).
You can opt-out by running the disableTelemetry() command.

true

As you can see, mongosh can connect to the database, even gives me back a "true" but then the connection freezes - no matter what I do, I can never get back some prompt or come to a point where I can close this open connection.

javsalgar commented 2 years ago

Hi,

It seems that the issue lies in mongosh itself. The first thing that I would do is to bump this thread https://www.mongodb.com/community/forums/t/mongosh-eval-freezes-the-shell/121406/6 so fixing this gets prioritized.

As a workaround, could you check if using an input file instead of eval mitigates the issue?

rmuehlbauer commented 2 years ago

yes, sure - i will try out the input file thing...i will let you guys know about the outcome.

rmuehlbauer commented 2 years ago

ok, I've tried to eliminate --eval by putting the command to a file and loading it with --file Please let me know, if I'm missing something here:

edited the common-scripts configmap to also provide a new ping-mongo.js file:
```
ping-mongo.js: |
db.adminCommand('ping')
```

after restarting the containers, the new file shows up under /bitnami/scripts/

I have no name!@mongodb-2:/bitnami/scripts$ ls -al
total 12
drwxrwsrwx 3 root 1001 4096 May 18 11:23 .
drwxr-xr-x 1 root root 4096 May 18 11:23 ..
drwxr-sr-x 2 root 1001 4096 May 18 11:23 ..2022_05_18_11_23_04.424415586
lrwxrwxrwx 1 root 1001   31 May 18 11:23 ..data -> ..2022_05_18_11_23_04.424415586
lrwxrwxrwx 1 root 1001   24 May 18 11:23 generate-certs.sh -> ..data/generate-certs.sh
lrwxrwxrwx 1 root 1001   20 May 18 11:23 ping-mongo.js -> ..data/ping-mongo.js
lrwxrwxrwx 1 root 1001   22 May 18 11:23 ping-mongodb.sh -> ..data/ping-mongodb.sh
lrwxrwxrwx 1 root 1001   25 May 18 11:23 readiness-probe.sh -> ..data/readiness-probe.sh
lrwxrwxrwx 1 root 1001   23 May 18 11:23 startup-probe.sh -> ..data/startup-probe.sh
I have no name!@mongodb-2:/bitnami/scripts$ more ping-mongo.js
db.adminCommand('ping')

using mongosh to execute the ping-mongo.js file


I have no name!@mongodb-2:/bitnami/scripts$ mongosh  --tls --tlsCertificateKeyFile=/certs/mongodb.pem --tlsCAFile=/certs/mongodb-ca-cert --port 27017 --file /bitnami/scripts/ping-mongo.js
Current Mongosh Log ID: 6284d8ee8377e27781e7357e
Connecting to:      mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&tls=true&tlsCertificateKeyFile=%2Fcerts%2Fmongodb.pem&tlsCAFile=%2Fcerts%2Fmongodb-ca-cert&appName=mongosh+1.4.1
Using MongoDB:      5.0.8
Using Mongosh:      1.4.1

For mongosh info see: https://docs.mongodb.com/mongodb-shell/

Loading file: /bitnami/scripts/ping-mongo.js

ends up again with a freezed connection...cant do nothing to exit it.

I also tried, to first establish a mongosh connection and load the script afterwards - this time it get at least a "true" from the loaded script bit I still cannot disconnect or close the mongosh connection - this time, at least CTRL-C worked:

I have no name!@mongodb-2:/bitnami/scripts$ mongosh --tls --tlsCertificateKeyFile=/certs/mongodb.pem --tlsCAFile=/certs/mongodb-ca-cert --port 27017 Current Mongosh Log ID: 6284d92dddf519a3f0aa50e7 Connecting to: mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&tls=true&tlsCertificateKeyFile=%2Fcerts%2Fmongodb.pem&tlsCAFile=%2Fcerts%2Fmongodb-ca-cert&appName=mongosh+1.4.1 Using MongoDB: 5.0.8 Using Mongosh: 1.4.1

For mongosh info see: https://docs.mongodb.com/mongodb-shell/

dev [direct: secondary] test> load ("ping-mongo.js") true dev [direct: secondary] test> quit()

Error: Asynchronous execution was interrupted by SIGINT



At the moment, it seems to be not a solution to me to substitute `--eval` by `--file` ...

rmuehlbauer commented 2 years ago

what do you guys think, it is an option to go back to mongo for now?

marcosbc commented 2 years ago

@rmuehlbauer Unfortunately, going back to mongo is not an option, since it was deprecated. Note that mongosh is the CLI tool supported by MongoDB.

That said, this issue seems to be the same one as #10316. Let me share the latest update:

We are trying to adapt our containers and charts to reduce the impact of this new client but any issue about the performance of mongosh should be reported to MongoDB

I will also link our internal task to this issue, so that any update is shared. In the meantime, if you happen to find any way ot improve the existing containers or charts, feel free to send a PR! We'd gladly review any incoming contributions.

rmuehlbauer commented 2 years ago

seems to be the same issue #10316

rafariossaa commented 2 years ago

Hi, Yes, it seems to be the same.

upcFrost commented 2 years ago

the memory consumption of the mongosh seem to be higher than the consumption of the bare (no data) mongo itself. Which is pretty painful for test envs. Rolled back to 11.1.10

github-actions[bot] commented 2 years ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

rmuehlbauer commented 2 years ago

....keep this open...

Mistral-valaise commented 2 years ago

Hello everyone, I have the same problem by Deploying bitnami/mongodb Release 12.1.20 on openshift. readiness probe failed on openshift by Mongodb Helm Chart bitnami . is there any news regarding this bug?

marcosbc commented 2 years ago

@Mistral-valaise Unfortunately not much. We are tracking the progress of the ticket in the MongoDB's bug tracker, MONGOSH-1240, but there hasn't any much progress.

In the meantime, we are evaluating different options such as disabling telemetry and the usage of .mongosh.js file, but it does not seem to have much of an impact unfortunately.

aneagoe commented 2 years ago

I've also noticed the same (running chart 12.1.20 with tag 5.0.9-debian-11-r1). As well, note the performance discrepancy:

1000620000@mongodb-65d69b45bb-6qh25:/$ time mongo --port 27017 --eval 'db.isMaster().ismaster || db.isMaster().secondary'
MongoDB shell version v5.0.9
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("5319d3f8-8135-429a-9a85-fbafcce7ccde") }
MongoDB server version: 5.0.9
true

real    0m0.069s
user    0m0.045s
sys 0m0.011s
1000620000@mongodb-65d69b45bb-6qh25:/$ time mongosh --port 27017 --eval 'db.isMaster().ismaster || db.isMaster().secondary'
Current Mongosh Log ID: 62c59da150ec163d83345663
Connecting to:      mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+1.5.0
Using MongoDB:      5.0.9
Using Mongosh:      1.5.0

For mongosh info see: https://docs.mongodb.com/mongodb-shell/

true

real    0m1.842s
user    0m1.279s
sys 0m0.134s

Honestly, it seems unusable as we have constant crashes (readiness probe fails and traffic stops being routed to the pod). I suppose I could try to define a custom probe (example would be great so I don't have to poke in the dark) or try to fully disable the readiness probe altogether. @upcFrost how did it go with the downgrade? Any issues?

frjaraur commented 2 years ago

Hi all, I also have this problem on 4.4.13 mongoDB release (mongodb-11.2.0) with mongosh 1.3.1. Any tip on this issue?, I have tried increasing delay time and timeout on probes but none of these changes seems to solve the isseu. I tried even enabling MONGODB_ENABLE_NUMACTL with no luck. I have to confirm that 4.4.12 mongoDB release (mongodb-11.0.0 chart) seems to be working fine :|

Mistral-valaise commented 2 years ago

I can also confirm that mongodb-11.0.0 chart seems to be working fine, thanks !

javsalgar commented 2 years ago

Thanks for letting us know!!

aneagoe commented 2 years ago

I solved it with this in the HR:

    livenessProbe:
      enabled: false
    customLivenessProbe:
      exec:
        command:
        - sh
        - -c
        - mongo --port $MONGODB_PORT_NUMBER --eval "db.adminCommand('ping')"
    readinessProbe:
      enabled: false
    customReadinessProbe:
      exec:
        command:
        - sh
        - -c
        - mongo --port $MONGODB_PORT_NUMBER --eval 'db.isMaster().ismaster || db.isMaster().secondary' | grep -q 'true'

However, this will be coming back until the problem is addressed in mongosh. But there doesn't seem to be any traction there... the forum thread was almost dead.

busyboy77 commented 2 years ago

Same here -- deployed the mongo chart ( mongodb-12.1.22 ) with Mongo-5.0.9. it always gets stuck in some probes, however, the same probe works perfectly fine when execute inside the container.

admirito commented 2 years ago

In the meantime, we are evaluating different options such as disabling telemetry and the usage of .mongosh.js file, but it does not seem to have much of an impact unfortunately. @marcosbc

I had the same issue and it seems that the telemetry issue even stalls the initdbScripts and the js/sh scripts cannot fix it. But I mannaged to fix the problem by providing a custom startup probe:

# startupProbe.enabled must be false (that is the default)
customStartupProbe:
  initialDelaySeconds: 5
  periodSeconds: 20
  timeoutSeconds: 10
  successThreshold: 1
  failureThreshold: 30
  exec:
    command:
      - sh
      - -c
      - |
        mongosh --eval 'disableTelemetry()'
        /bitnami/scripts/startup-probe.sh

Disabling the telemetry seems to fix the problem. I think mongosh is trying to connect to the internet for telemetry, but if your pod is not connected to the internet it will wait a long time for it to time out.

It would be nice if the bitnami chart had a proper way to disable/enable mongosh telemetry.

BehbudSh commented 2 years ago

Hi everyone, I'm facing the same issue and fixed it on my end by increasing some resources and also liveness and readiness probes timeout seconds because it's a heavy process and made some compare between mongo&mongosh commands.

resources:
    limits:
      cpu: "300m"
      memory: "2048Mi"
    requests:
      cpu: "40m"
      memory: "258Mi"
  livenessProbe:
    enabled: true
    initialDelaySeconds: 30
    periodSeconds: 20
    timeoutSeconds: 20
    failureThreshold: 6
    successThreshold: 1
  readinessProbe:
    enabled: true
    initialDelaySeconds: 30
    periodSeconds: 20
    timeoutSeconds: 20
    failureThreshold: 6
    successThreshold: 1

macrozone commented 2 years ago

is there an alternative to disabling the liveness and readiness probes? Is there really no other way to ping the db?

I am also unsure whether I should use this chart on production given this problem, is it still safe or should I use an alternative?

trust56 commented 2 years ago

We are using this chart on a production environment with the workaround told above (overriding the probes to execute mongo instead of mongosh) since 3 weeks. It's all right so far. Of course we are waiting for this issue to be solved, since the mongo executable is deprecated.

macrozone commented 2 years ago

We are using this chart on a production environment with the workaround told above (overriding the probes to execute mongo instead of mongosh since 3 weeks. It's all right so far. Of course we are waiting for this issue to be solved, since the mongo executable is deprecated.

so using mongo works? its just deprecated?

wouldn't it make sense that bitnami changes the probes to use mongo instead of mongosh as well until its actually no longer available?

trust56 commented 2 years ago

so using mongo works?

Yeah, seems like, well at least using it for the probes.

its just deprecated?

Yes, and as such, I guess, there is no support for that, no guarantee on how it works. Really it's just a workaround for the probes to use it still.

wouldn't it make sense that bitnami changes the probes to use mongo instead of mongosh as well until its actually no longer available?

I don't know about that. I'm not even sure what is the root cause for this issue. I also do not know, what is the difference between mongo and mongosh that causes such a behavioral difference. Bitnami can say.

macrozone commented 2 years ago

so using mongo works?

Yeah, seems like, well at least using it for the probes.

its just deprecated?

Yes, and as such, I guess, there is no support for that, no guarantee on how it works. Really it's just a workaround for the probes to use it still.

wouldn't it make sense that bitnami changes the probes to use mongo instead of mongosh as well until its actually no longer available?

I don't know about that. I'm not even sure what is the root cause for this issue. I also do not know, what is the difference between mongo and mongosh that causes such a behavioral difference. Bitnami can say.

thank you for your clarifications!

mongo was deprecated in mongodb 5.0. I am unsure whether it got removed in the recent 6.0 release, the release notes do not mention it.

EDIT: the workaround works with mongdob 5.0.9

EDIT 2: as expected, it does not work with the latest mongodb version 6

alex-samuilov commented 2 years ago

customStartupProbe:
  initialDelaySeconds: 5
  periodSeconds: 20
  timeoutSeconds: 10
  successThreshold: 1
  failureThreshold: 30
  exec:
    command:
      - sh
      - -c
      - |
        mongosh --eval 'disableTelemetry()'
        /bitnami/scripts/startup-probe.sh

Tried to use this hint, but without a success. I use 13.0.0 version of the chart, and after starting it says: Startup probe errored: rpc error: code = Unknown desc = deadline exceeded ("DeadlineExceeded"): context deadline exceeded

bslpzk commented 2 years ago

Hi guys.

After migrating my clusters to the latest version of charts and moving to MongoDB 6, I also faced the problem of huge increase in resource consumption, especially for RAM. I also had a container with metrics-exporter that constantly got crashed, which caused clusters evens to be constantly filled with messages saying that pod Mongo was in Unhealthy state. Because of this, I had to look for a solution to get rid of these problems without rolling back to the 5th version of Mongo.

Here's what came up in the end: The option of checking container metrics with the outdated mongo shell was no longer possible because it was removed from the bitnami docker image for charts older than 12.1.31. But if you run the mongo binary in a container with Mongo 6.0.2, you can get basic metrics, even with a deprecation warning. Here is an example of running the old shell:

I have no name!@shared-mongo-mongodb-0:/$ mongo
MongoDB shell version v5.0.10
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("a348611-5017-47ff-8ba7-bfd5400c0bc1") }
MongoDB server version: 6.0.2
WARNING: shell and server versions do not match
================
Warning: the "mongo" shell has been superseded by "mongosh",
which delivers improved usability and compatibility.The "mongo" shell has been deprecated and will be removed in
an upcoming release.
For installation instructions, see
https://docs.mongodb.com/mongodb-shell/install/
================

So the base docker image for mongodb has been rebuilt and placed in my own regestry. Here is my dockerfile:

FROM bitnami/mongodb:5.0.10-debian-11-r3 as builder
FROM bitnami/mongodb:6.0.2-debian-11-r0 as export
COPY --from=builder --chown=0:0 /opt/bitnami/mongodb/bin/mongo /opt/bitnami/mongodb/bin/mongo

Now it is possible again to use custom metrics, which were suggested by aneagoe. The only thing you will need is to specify in values.yaml the custom image and the credentials to connect to the registry (in my case it will be AWS ECR).

livenessProbe:
  enabled: false
customLivenessProbe:
  exec:
    command:
      - sh
      - '-c'
      - mongo --port $MONGODB_PORT_NUMBER --eval "db.adminCommand('ping')"
  failureThreshold: 6
  initialDelaySeconds: 30
  periodSeconds: 20
  successThreshold: 1
  timeoutSeconds: 10

readinessProbe:
  enabled: false
customReadinessProbe:
  exec:
    command:
      - sh
      - '-c'
      - mongo --port $MONGODB_PORT_NUMBER --eval 'db.isMaster().ismaster || db.isMaster().secondary' | grep -q 'true'
  failureThreshold: 6
  initialDelaySeconds: 5
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 5

image:
  debug: true
  digest: ''
  pullPolicy: IfNotPresent
  pullSecrets:
    - <my-custom-aws-registry-secret>
  registry: <my-custom-aws-registry>.amazonaws.com
  repository: custom-mongodb
  tag: mongodb-6

Next, I need to solve the problems of the mongo-exporter crash and reduce the resources it consumes. The problem turned out to be that Percona, which develops this exporter, has released a new exporter release completely written from scratch in the fall of 2020 link And so it consumes a lot more resources. So, I only require basic MongoDB metrics, which is why it's enough for me to have the monitoring system display 2 statuses - healthy or unhealthy. To do this, let's do the same as we did with mongo-shell - roll back to the last available version of the old mongo-exporter. This was the release of Sep 25, 2020 - v0.11.2 It's enough to specify the right docker image in values.yaml and fix the exporter launch command in the container:

metrics:
  args:
    - |
      /bin/mongodb_exporter --web.listen-address ":{{ .Values.metrics.containerPort }}" --mongodb.uri "{{ include "mongodb.mongodb_exporter.uri" . }}" {{ .Values.metrics.extraFlags }}
  command:
    - /bin/bash
    - '-ec'
  enabled: true
  image:
    pullPolicy: IfNotPresent
    registry: docker.io
    repository: bitnami/mongodb-exporter
    tag: 0.11.2-debian-10-r108
  serviceMonitor:
    enabled: true
    interval: 60s

I understand that using deprecated code is a very questionable approach, but I have no other choice right now. The result of what I have achieved can be seen in the CPU/RAM consumption graphs for the POD with mongodb + metrics-exporter.

Please share your comments if you have a different solution to this problem.

fmulero commented 2 years ago

Great post @bslpzk, you will help a lot of users.

The mongo shell was deprecated in MongoDB 5.0 and definitely removed in MongoDB 6. We are tracking the progress of the ticket in the MongoDB's bug tracker, MONGOSH-1240, but there hasn't any much progress.

xtianus79 commented 2 years ago

This is effectively broken in Kubernetes and unusable. What is a suggestion for now as this seems to have 0 traction by mongo themselves. Is going back to 5 recommended? I think this may just be the right solution for now.

fmulero commented 2 years ago

Please vote/follow or add comments to MONGOSH-1240

In the meantime, relaxing the readiness and liveness probes could help you, but it isn't a real solution

customLivenessProbe:
  tcpSocket:
    port: mongodb
  initialDelaySeconds: 5
  periodSeconds: 20
  timeoutSeconds: 10
  successThreshold: 1
  failureThreshold: 30
customReadinessProbe:
  tcpSocket:
    port: mongodb
  initialDelaySeconds: 5
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 6
  successThreshold: 1
startupProbe:
  enabled: true

holzleitner commented 1 year ago

Hello everyone!

If you are still looking for a solution: We have developed a small healthcheck application that we use in our Kubernetes cluster.

https://github.com/instant-solutions/mongo-healthcheck

And the best thing about it: we are even faster than the old mongo command.

olivierboudet commented 1 year ago

hello @javsalgar new issues are closed (see #17346) as duplicates of this one, which is also closed :) would it be necessary to reopen it ? The issue still exists and the proposal of @holzleitner to include a simple binary instead of mongosh to do health check might be a good one ?

rafariossaa commented 1 year ago

Hi, I think this issue with the performance of mongosh should be reported upstream, maybe they can improve it. Also, as happens with this kind of perfomance issues, it is very dependent on your cluster. IMHO what could be done here is to tune the parameters for your current deployment. The thing is that the mongo cli is going to be deprecated sooner or later, as you can see by the message that mongo prints:

MongoDB shell version v5.0.20
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("9caa5cef-a0d7-422f-b78b-88c85aa30c9d") }
MongoDB server version: 5.0.20
================
Warning: the "mongo" shell has been superseded by "mongosh",
which delivers improved usability and compatibility.The "mongo" shell has been deprecated and will be removed in
an upcoming release.
For installation instructions, see
https://docs.mongodb.com/mongodb-shell/install/
================

olivierboudet commented 1 year ago

Hello, just to be sure to fully understand you, you are advising to reduce the interval of readiness and liveness probes to avoid consuming too much CPU ? Why not include a simple (and efficient) binary to do health check (and only that, not a full REPL environment like mongosh).

rafariossaa commented 1 year ago

Hi, Yes, that could be an approach. You could create a new image based on bitnami's one and add the binary that suit your use case best. If you have a suggestion for one, we can study it to be included. We stick to what upstream produces and package it to be used easily.

github-actions[bot] commented 1 year ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 1 year ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

nightguide commented 1 year ago

Hello everyone. We are interested in whether there will be any official solution to this problem from Bitnami? It looks like the current versions of helm charts with MongoDB 5-x, 6-x, 7-x (from Bitnami) are unsuitable for running in Kubernetes clusters.

To solve the problem, we had to build our own custom Docker image with MongoDB (which contains mongo utility), fork the current helm chart into our private repository and redefine the parameters regarding the private repository and customized Docker images + setup customProbes. All this requires additional effort to support and maintain customized helm charts and images - we would like to use the official versions of the helm charts and docker images from the upstream of Bitnami.

rafamiga commented 1 year ago

I also had problem with ZOMBIE processes left by readiness probes on my k3s cluster but since I've switched to liveness probe using this project:

https://github.com/syndikat7/mongodb-rust-ping

it all went away. And it's such a simple project it could be easily extended to do both liveness and readiness probes.

The "stock" bitnami HELM chart readiness probes still fail sometimes but it's much less noticeable with relaxed constraints:

  readinessProbe:
    periodSeconds: 31
    timeoutSeconds: 20

Rafouf69 commented 10 months ago

Still no news on this subject ?

javsalgar commented 10 months ago

AFAIK, the upstream ticket did not have many updates https://jira.mongodb.org/browse/MONGOSH-1240

bitnami / charts