Closed ErmakovDmitriy closed 1 year ago
@ErmakovDmitriy
In your mongodb setup, your default write concern should be set to 1
(see https://www.mongodb.com/docs/manual/reference/mongodb-defaults/#default-write-concern)
Can you please check that this is set accordingly by issuing db.adminCommand({getDefaultRWConcern:1})
If you need to adjust it, you can use db.adminCommand({ "setDefaultRWConcern": 1, "defaultWriteConcern": { "w": 1 } });
I can reproduce the issue if this is not set correctly. Also, in a case like that other applications like MongoDB Compass, will appear to freeze on write operations.
Hi @moesterheld ,
Thank you for your explanation.
Yes, I had the majority
set (see below). After I changed, as you recommended, it seems to work.
May I ask you, if it is possible to add this information to the official Graylog deployment documentation? I think that there could be more people with near-to-zero knowledge of how MongoDB works and the information might help them.
rs0 [direct: primary] test> db.adminCommand({getDefaultRWConcern:1})
{
defaultReadConcern: { level: 'local' },
defaultWriteConcern: { w: 'majority', wtimeout: 0 },
updateOpTime: Timestamp({ t: 1694067864, i: 1 }),
updateWallClockTime: ISODate("2023-09-07T06:24:29.549Z"),
defaultWriteConcernSource: 'global',
defaultReadConcernSource: 'implicit',
localUpdateWallClockTime: ISODate("2023-09-08T06:49:24.548Z"),
ok: 1,
'$clusterTime': {
clusterTime: Timestamp({ t: 1695372883, i: 4 }),
signature: {
hash: Binary(Buffer.from("1f9cf21ed677f937cdda1c219a9b3d6e720a25b2", "hex"), 0),
keyId: Long("7275644989919461383")
}
},
operationTime: Timestamp({ t: 1695372883, i: 4 })
}
rs0 [direct: primary] test>
rs0 [direct: primary] test> db.adminCommand({ "setDefaultRWConcern": 1, "defaultWriteConcern": { "w": 1 } });
{
defaultReadConcern: { level: 'local' },
defaultWriteConcern: { w: 1, wtimeout: 0 },
updateOpTime: Timestamp({ t: 1695372905, i: 2 }),
updateWallClockTime: ISODate("2023-09-22T08:55:05.503Z"),
defaultWriteConcernSource: 'global',
defaultReadConcernSource: 'implicit',
localUpdateWallClockTime: ISODate("2023-09-22T08:55:05.504Z"),
ok: 1,
'$clusterTime': {
clusterTime: Timestamp({ t: 1695372905, i: 3 }),
signature: {
hash: Binary(Buffer.from("fae4045750fbc04915eade51cf494999ff97a7c5", "hex"), 0),
keyId: Long("7275644989919461383")
}
},
operationTime: Timestamp({ t: 1695372905, i: 3 })
}
rs0 [direct: primary] test>
rs0 [direct: primary] test> db.adminCommand({getDefaultRWConcern:1})
{
defaultReadConcern: { level: 'local' },
defaultWriteConcern: { w: 1, wtimeout: 0 },
updateOpTime: Timestamp({ t: 1695372905, i: 2 }),
updateWallClockTime: ISODate("2023-09-22T08:55:05.503Z"),
defaultWriteConcernSource: 'global',
defaultReadConcernSource: 'implicit',
localUpdateWallClockTime: ISODate("2023-09-22T08:55:05.504Z"),
ok: 1,
'$clusterTime': {
clusterTime: Timestamp({ t: 1695372909, i: 4 }),
signature: {
hash: Binary(Buffer.from("f4de4c21ef847ee0d294c1272a3dbe54eccc82f3", "hex"), 0),
keyId: Long("7275644989919461383")
}
},
operationTime: Timestamp({ t: 1695372909, i: 4 })
}
rs0 [direct: primary] test>
@ErmakovDmitriy Glad to help. I will forward the request to update the documentation to our docs team.
We use Graylog deployed on Kubernetes (Rancher) with https://artifacthub.io/packages/helm/kong-z/graylog helm chart.
In order to have the service available, we deployed it as:
Here is the current state:
Here is the MongoDB status under normal operations:
This toplogy, from my understanding, is supposed to survive a failure of either one Graylog Pod or one of the MongoDB replica Pods.
Expected Behavior
When I delete the primary mongodb Pod and make it unschedulable so that Kubernetes cannot restart it to simulate its failure, MongoDB replica and arbiter complain that the primary is lost for a while but then the replica is promoted to primary.
I expect Graylog to connect to the new primary and continue operation.
Current Behavior
After the primary MongodB is killed, the secondary is promoted to be the new primary.
When I connect to the replica with MongoDB compass or mongosh, I can see that the graylog database is here (I will provide mongodb outputs below in this section).
Graylog also complains that it cannot contact mongodb for a while, then it discovers that the secondary became primary but Graylog does not returns to operation until the original primary MongoDB is started. Does not return to operation == Web UI is not loaded, always in "loading" state. This behavior remains for at least an hour (I did not try to wait longer).
MongoDB relects the secondary as the primary (status from mongo-mongodb-1 as the normal operations primary mongo-mongodb-0 is dead):
Graylog's logs are attached as a file: graylog.txt Line 180: Graylog discovers that the primary mongodb is stopping Line 515: Graylog's monitor connects to the new primary and reports that its status is REPLICA_SET_PRIMARY.
Steps to Reproduce (for bugs)
Context
By design we use Kubernetes nodes with local storage only in order not to have dependencies for external systems in our logging solution (logging should be available to troubleshoot when everything else is broken).
For that reason we rely on application level high-availability: 2 Graylog replicas + Kubernetes load balancing, 2 MongoDB replicas, 3 Opensearch (each index has 1-2 replicas).
The fact that Graylog fails to "reconnect" to a new primary MongoDB makes this goal somewhat difficult to achieve.
Your Environment
Configuration: Graylog: graylog.conf.txt
Environment variables:
GRAYLOG_ELASTICSEARCH_HOSTS:
https://admin:password@opensearch-cluster-master-0.opensearch-cluster-master-headless.graylog-test.svc.cluster.local:9200,https://admin:password@opensearch-cluster-master-1.opensearch-cluster-master-headless.graylog-test.svc.cluster.local:9200,https://admin:password@opensearch-cluster-master-2.opensearch-cluster-master-headless.graylog-test.svc.cluster.local:9200
GRAYLOG_MONGODB_URI:
mongodb://graylog:password@mongo-mongodb-0.mongo-mongodb-headless.graylog-test.svc.cluster.local:27017,mongo-mongodb-1.mongo-mongodb-headless.graylog-test.svc.cluster.local:27017/graylog?replicaSet=rs0
MongoDB is configured as replicaSet=
rs0
.