Closed sethjones closed 8 months ago
Likely related: #13364
I did a bit more troubleshooting tonight.
I attempted to install the chart in another of my clusters, which differs in configuration (but both use rook/ceph).
The same failure occurred.
However, I ran a test in the initial cluster, with persistence off. Everything started up as it should. All pods came up, joined the cluster, and were operating successfully without intervention.
More troubleshooting: I recreated a new storage class to allow for an xfs file system, based on its recommendation.
Results are the same.
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
keep open
Hi @sethjones, apologies it took this long to reply; it seems the issue didn't get notify in our board. I've had some problems to reproduce the error but there was an internal task related to https://github.com/bitnami/charts/issues/13364 (which had a fix proposed at https://github.com/bitnami/containers/pull/24938) because users both in the PR and issue went MIA. I'll reopen the task and prioritise it so this can get a proper solution.
I'll put this ticket on-hold
so you can get notified about any progress on our side.
Was this resolved?
I am able to recreate the issue in release v6.6.6.
Hi @sethjones, the team hasn't been able to work on this yet. I have increased the priority of our internal task and move it from the backlog to be selected from development.
If you're interested in contributing a solution to expedite this, we welcome you to create a pull request. The Bitnami team would be happy to review your submission and offer feedback. You can find the contributing guidelines here.
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.
Hi I still have the issue, why is this ever happening ?
To bypass the "MongoServerError: Authentication failed." I tried to set the auth.enable value to false but this is doing nothing but a brand new error, why a passwordless sharded cluster is not supported if the corresponding flag exists ?
07:20:25.40 INFO ==> Setting node as primary
mongodb 07:20:25.42
mongodb 07:20:25.42 Welcome to the Bitnami mongodb-sharded container
mongodb 07:20:25.42 Subscribe to project updates by watching https://github.com/bitnami/containers
mongodb 07:20:25.42 Submit issues and feature requests at https://github.com/bitnami/containers/issues
mongodb 07:20:25.42
mongodb 07:20:25.42 INFO ==> ** Starting MongoDB Sharded setup **
mongodb 07:20:25.43 INFO ==> Validating settings in MONGODB_* env vars...
mongodb 07:20:25.43 ERROR ==> The MONGODB_ROOT_PASSWORD environment variable is empty or not set. Set the environment variable ALLOW_EMPTY_PASSWORD=yes to allow the container to be started with blank passwords. This is only recommended for development.
Hello @MrWormsy, we are aware the issue is still present. It seems our automations closed this issue by mistake, but the associated internal task is still on our backlog. Thanks for providing more info, we'll leave this issue opened and notify any advances on our side.
Hi @sethjones @MrWormsy
I was unable to reproduce the issue on a GKE cluster but, based on the linked issues, it seems this error can only be reproduced when persistence is enabled and the PV StorageClass uses a "slow" filesystem, which isn't my case so I was unable to reproduce it.
As @rafariossaa mentioned on one of the mentioned issued, there's a environment variable (MONGODB_MAX_TIMEOUT
) which can be customized setting the common.mongodbMaxWaitTimeout
parameter (set by default to 120 seconds). This setting can be used in combination with configsvr.readinessProbe.initialDelaySeconds
& shardsvr.dataNode.readinessProbe.initialDelaySeconds
to give the initialization logic more time to start Mongo in background, create users, configure the replicaset, etc. You could even disable these probes during the 1st installation to double-check the issue is related with a slow filesystem.
Hi
I've just tried setting the parameters to a value of 3600s but I get the same error. I think the error is indeed due to the fact that my server uses HDD instead of SSD. I have no problem running the chart on my personal computer.
Thank you for your help.
Hi all, I guess I have found the root cause of this Authentication failed issue. This happens mostly in slower systems.
Reason or Bug:
In the Bitnami script file libmongodb.sh
, in mongodb_is_primary_node_up()
, there are the following lines of code that check if the MongoDB instance has turned from secondary to primary.
result=$(
mongodb_execute_print_output "$user" "$password" "admin" "$host" "$port" <<EOF
db.isMaster().ismaster
EOF
)
grep -q "true" <<<"$result"
The problem is in the line grep -q "true" <<<"$result"
. As part of the $result output, MongoDB gives the following connection string which also has the string "true".
"Connecting to: mongodb://127.0.0.1:27017/admin?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.1.1"
Hence even when it is secondary, i.e. output of db.isMaster().ismaster
is false
, the grep -q "true" <<<"$result"
check passes and it proceeds to create the root user. When it tries to create a root user during the secondary state, the root user creation fails. In faster systems, it turns into primary quickly and this bug in the code doesn't matter.
Changing the grep check into a more specific one like the following helps to check if the mongodb instance turns into primary and then the root user gets created successfully and Authentication passes.
grep -q "\[direct: primary\] admin> true" <<<"$result"
Any Bitnami coordinators, please help to fix and validate this code and commit this update. Also if there are any other issue threads similar to this, please link to this information.
Thank you for bringing this issue to our attention. We appreciate your involvement! If you're interested in contributing a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.
Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance.
Hi @carrodher, Thanks for your appreciation.
The bug is not in https://github.com/bitnami/charts, but in https://github.com/bitnami/containers. I have created a pull request (PR) in bitnami/containers. PR : https://github.com/bitnami/containers/pull/55910
Please help to take this further.
As of today, I have test deployed chart v7.6.0 with image 7.0.5-debian-12-r2 and the issue has been resolved. Thanks @esasidharan
Name and Version
bitnami/mongodb-sharded 6.5.3
What architecture are you using?
amd64
What steps will reproduce the bug?
Are you using any custom parameters or values?
Set:
auth.rootPassword
,auth.replicaSetKey
( i have attempted without as well)I have attempted to work around the issues by extending all available timeouts, without success.
I have also attempted enforcing minimum resources of cpu: 8 and memory: 8Gi to ensure that pods were not being resources starved. No change.
What is the expected behavior?
Sharded deployment is created, and starts up using provided username and passwords.
What do you see instead?
Upon initial creation, configsvr enters this state:
Mongos:
Shards 0&1
If I take the additional step to restart the configsvr after initial startup, it will enter a running state and not get stuck in the "Authentication Failed" loop.
After start up the following errors are present from the mongos/shard pods:
Additionally, if I attempt to use the mongosh client on the configsvr I am unable to use the root username and defined password.
It appears that that username creation/modification and set of the root password is not happening.
Additional information
Kubernetes Cluster Information:
Deploying this chart on my desktop via Kind led to a working deployment.