Open simonli866 opened 2 years ago
First, ensure the backends like Redis, RabbitMQ, MongoDB are working. If StackStorm can't use those, it'll fail in the crash loop.
Are those backends up and running? Check the logs.
I think RabbitMQ is working. And I looked back at the error message and it showed a line about rabbitMQ :
amqp.exceptions.NotFound: Queue.declare: (404) NOT_FOUND - home node 'rabbit@stackstorm-rabbitmq-0.stackstorm-rabbitmq-headless.default.svc.cluster.local' of durable queue 'st2.preinit' in vhost '/' is down or inaccessible
Is it caused by this problem? How to fix it
@armab
I checked the RabbitMQ log and found this problem :
2022-02-06 08:35:45.319 [info] <0.2547.0> accepting AMQP connection <0.2547.0> (10.1.16.138:44746 -> 10.1.16.181:5672) 2022-02-06 08:35:45.463 [info] <0.2550.0> accepting AMQP connection <0.2550.0> (10.1.16.172:45116 -> 10.1.16.181:5672) 2022-02-06 08:35:45.556 [info] <0.2547.0> connection <0.2547.0> (10.1.16.138:44746 -> 10.1.16.181:5672): user 'admin' authenticated and granted access to vhost '/' 2022-02-06 08:35:45.618 [info] <0.2550.0> connection <0.2550.0> (10.1.16.172:45116 -> 10.1.16.181:5672): user 'admin' authenticated and granted access to vhost '/' 2022-02-06 08:35:46.162 [info] <0.2547.0> closing AMQP connection <0.2547.0> (10.1.16.138:44746 -> 10.1.16.181:5672, vhost: '/', user: 'admin')
I tried to solve the problem by restarting the server. It seems to me that the time out was due to a service requiring RabbitMQ that had not yet been started. I do not know whether my analysis is correct, please help to answer it. In addition, if a server suddenly goes down in the reproduction environment, it is unacceptable if the problem I described occurs after the restart. Can I adjust a parameter value to avoid this problem? I am a beginner stackStorm please help me thanks.
StackStorm services are failing and restarting by K8s until they reach a working connection from the backends. So as long as you fix RabbitMQ, StackStorm cluster auto heals and recovers.
@armab I fixed the problem this time by restarting it again. However, if this problem is encountered in the production environment, I think it cannot be solved by restarting the service again. Is there another way to do it?
In your case Redis backend was failing too:
stackstorm-redis-node-0 1/2 CrashLoopBackOff 24 (2m47s ago) 27h
stackstorm-redis-node-1 1/2 CrashLoopBackOff 24 (111s ago) 27h
As said before, first you need to ensure all the backends (Redis, MongoDB, RabbitMQ) are operable and available. If they work OK, - the stackstorm cluster would be UP automatically too.
Can you check what was the problem with Redis from the stackstorm-redis-node-0
and stackstorm-redis-node-1
logs?
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 10m default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 10m default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 10m default-scheduler Successfully assigned default/stackstorm-redis-node-0 to centos-master
Normal Pulled 10m kubelet Container image "docker.io/bitnami/redis:6.0.9-debian-10-r66" already present on machine
Normal Created 10m kubelet Created container redis
Normal Started 10m kubelet Started container redis
Normal Created 10m kubelet Created container sentinel
Normal Started 10m kubelet Started container sentinel
Warning Unhealthy 9m53s kubelet Readiness probe failed:
Could not connect to Redis at localhost:6379: Connection refused
Normal Killing 9m37s kubelet Container sentinel failed liveness probe, will be restarted
Warning Unhealthy 9m37s (x5 over 9m56s) kubelet Liveness probe failed:
Could not connect to Redis at localhost:26379: Connection refused
Normal Pulled 9m6s (x2 over 10m) kubelet Container image "docker.io/bitnami/redis-sentinel:6.0.9-debian-10-r66" already present on machine
Warning Unhealthy 4m53s (x56 over 9m58s) kubelet Readiness probe failed:
Could not connect to Redis at localhost:26379: Connection refused
@armab I used this commend kubectl describe pod stackstorm-redis-node-0
, see the problem log above, how to fix it?
I think the port is working
And she kept displaying the following warning. WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128
. I modified the corresponding parameter value in the system, but it still did not take effect. Is this warning related to redis not working properly?
I believe somaxconn
is just a warning and shouldn't be a reason for failing cluster. You can ignore it for now.
You showed the logs for the redis
container, which looks fine.
Check the sentinel
container for any errors as that one failed the liveness probe.
[root@centos-master ~]# kubectl logs stackstorm-redis-node-0 -c sentinel Could not connect to Redis at 192.168.170.195:26379: Connection refused
I'm experiencing the exact same issue with Microk8s and Stackstorm HA. All the errors ShimingLee is displaying is exactly what I see. Only difference is, when I type "kubectl logs stackstorm-redis-node-0 -c sentinel", I don't see anything. Doesn't say it can't connect to Redis or anything.
Now, I've gotten MongoDB, RabbitMQ and Redis working. Here is what they look like:
MongoDB:
Advertised Hostname: stackstorm-mongodb-0.stackstorm-mongodb-headless.stackstorm.svc.cluster.local
Pod name matches initial primary pod name, configuring node as a primary
mongodb 20:55:14.41
mongodb 20:55:14.41 Welcome to the Bitnami mongodb container
mongodb 20:55:14.42 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-mongodb
mongodb 20:55:14.42 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-mongodb/issues
mongodb 20:55:14.42
mongodb 20:55:14.43 INFO ==> ** Starting MongoDB setup **
mongodb 20:55:14.51 INFO ==> Validating settings in MONGODB_* env vars...
mongodb 20:55:14.55 INFO ==> Initializing MongoDB...
mongodb 20:55:14.59 INFO ==> Deploying MongoDB from scratch...
mongodb 20:55:16.08 INFO ==> Creating users...
mongodb 20:55:16.09 INFO ==> Creating root user...
MongoDB shell version v4.0.27
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("bccb9936-51ed-458b-b257-d662c1ea3e94") }
MongoDB server version: 4.0.27
Successfully added user: {
"user" : "root",
"roles" : [
{
"role" : "root",
"db" : "admin"
}
]
}
bye
mongodb 20:55:16.47 INFO ==> Creating user 'st2-admin'...
MongoDB shell version v4.0.27
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("974a97bb-aeb6-48a5-8398-3a9fb5ba809e") }
MongoDB server version: 4.0.27
Successfully added user: {
"user" : "st2-admin",
"roles" : [
{
"role" : "readWrite",
"db" : "st2"
}
]
}
bye
mongodb 20:55:16.83 INFO ==> Users created
mongodb 20:55:16.84 INFO ==> Writing keyfile for replica set authentication...
mongodb 20:55:16.90 INFO ==> Configuring MongoDB replica set...
mongodb 20:55:16.92 INFO ==> Stopping MongoDB...
mongodb 20:55:21.01 INFO ==> Configuring MongoDB primary node
mongodb 20:55:21.49 INFO ==> Stopping MongoDB...
mongodb 20:55:23.56 INFO ==> Enabling authentication...
mongodb 20:55:23.60 INFO ==> ** MongoDB setup finished! **
mongodb 20:55:23.68 INFO ==> ** Starting MongoDB **
2022-02-25T20:55:23.751+0000 I CONTROL [main] ***** SERVER RESTARTED *****
2022-02-25T20:55:23.757+0000 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2022-02-25T20:55:23.799+0000 I CONTROL [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/bitnami/mongodb/data/db 64-bit host=stackstorm-mongodb-0
2022-02-25T20:55:23.799+0000 I CONTROL [initandlisten] db version v4.0.27
2022-02-25T20:55:23.799+0000 I CONTROL [initandlisten] git version: d47b151b55f286546e7c7c98888ae0577856ca20
2022-02-25T20:55:23.799+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.1.0l 10 Sep 2019
2022-02-25T20:55:23.799+0000 I CONTROL [initandlisten] allocator: tcmalloc
2022-02-25T20:55:23.799+0000 I CONTROL [initandlisten] modules: none
2022-02-25T20:55:23.799+0000 I CONTROL [initandlisten] build environment:
2022-02-25T20:55:23.799+0000 I CONTROL [initandlisten] distmod: debian92
2022-02-25T20:55:23.799+0000 I CONTROL [initandlisten] distarch: x86_64
2022-02-25T20:55:23.800+0000 I CONTROL [initandlisten] target_arch: x86_64
2022-02-25T20:55:23.800+0000 I CONTROL [initandlisten] options: { config: "/opt/bitnami/mongodb/conf/mongodb.conf", net: { bindIpAll: true, ipv6: false, port: 27017, unixDomainSocket: { enabled: true, pathPrefix: "/opt/bitnami/mongodb/tmp" } }, processManagement: { fork: false, pidFilePath: "/opt/bitnami/mongodb/tmp/mongodb.pid" }, replication: { enableMajorityReadConcern: true, replSetName: "rs0" }, security: { authorization: "enabled", keyFile: "/opt/bitnami/mongodb/conf/keyfile" }, setParameter: { enableLocalhostAuthBypass: "false" }, storage: { dbPath: "/bitnami/mongodb/data/db", directoryPerDB: false, journal: { enabled: true } }, systemLog: { destination: "file", logAppend: true, logRotate: "reopen", path: "/opt/bitnami/mongodb/logs/mongodb.log", quiet: false, verbosity: 0 } }
2022-02-25T20:55:23.800+0000 I STORAGE [initandlisten] Detected data files in /bitnami/mongodb/data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2022-02-25T20:55:23.801+0000 I STORAGE [initandlisten]
2022-02-25T20:55:23.801+0000 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2022-02-25T20:55:23.801+0000 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem
2022-02-25T20:55:23.801+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=7462M,cache_overflow=(file_max=0M),session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2022-02-25T20:55:25.258+0000 I STORAGE [initandlisten] WiredTiger message [1645822525:258863][1:0x7faa506f0080], txn-recover: Main recovery loop: starting at 2/45696 to 3/256
2022-02-25T20:55:25.259+0000 I STORAGE [initandlisten] WiredTiger message [1645822525:259776][1:0x7faa506f0080], txn-recover: Recovering log 2 through 3
2022-02-25T20:55:25.463+0000 I STORAGE [initandlisten] WiredTiger message [1645822525:463697][1:0x7faa506f0080], txn-recover: Recovering log 3 through 3
2022-02-25T20:55:25.646+0000 I STORAGE [initandlisten] WiredTiger message [1645822525:646346][1:0x7faa506f0080], txn-recover: Set global recovery timestamp: 6219423900000008
2022-02-25T20:55:25.686+0000 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(1645822521, 8)
2022-02-25T20:55:25.686+0000 I STORAGE [initandlisten] Triggering the first stable checkpoint. Initial Data: Timestamp(1645822521, 8) PrevStable: Timestamp(0, 0) CurrStable: Timestamp(1645822521, 8)
2022-02-25T20:55:25.688+0000 I STORAGE [initandlisten] Starting to check the table logging settings for existing WiredTiger tables
2022-02-25T20:55:25.696+0000 I STORAGE [initandlisten] Starting OplogTruncaterThread local.oplog.rs
2022-02-25T20:55:25.696+0000 I STORAGE [initandlisten] The size storer reports that the oplog contains 7 records totaling to 1323 bytes
2022-02-25T20:55:25.696+0000 I STORAGE [initandlisten] Scanning the oplog to determine where to place markers for truncation
2022-02-25T20:55:25.697+0000 I STORAGE [initandlisten] WiredTiger record store oplog processing took 1ms
2022-02-25T20:55:25.723+0000 I STORAGE [initandlisten] Finished adjusting the table logging settings for existing WiredTiger tables
2022-02-25T20:55:25.726+0000 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/bitnami/mongodb/data/db/diagnostic.data'
2022-02-25T20:55:25.738+0000 I REPL [initandlisten] Rollback ID is 1
2022-02-25T20:55:25.740+0000 I REPL [initandlisten] Recovering from stable timestamp: Timestamp(1645822521, 8) (top of oplog: { ts: Timestamp(1645822521, 8), t: 1 }, appliedThrough: { ts: Timestamp(0, 0), t: -1 }, TruncateAfter: Timestamp(0, 0))
2022-02-25T20:55:25.741+0000 I REPL [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1645822521, 8)
2022-02-25T20:55:25.741+0000 I REPL [initandlisten] No oplog entries to apply for recovery. Start point is at the top of the oplog.
2022-02-25T20:55:25.745+0000 I REPL [replexec-0] New replica set config in use: { _id: "rs0", version: 1, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 0, host: "stackstorm-mongodb-0.stackstorm-mongodb-headless.stackstorm.svc.cluster.local:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 5.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('62194239ea2a440ffd0bd06b') } }
2022-02-25T20:55:25.745+0000 I REPL [replexec-0] This node is stackstorm-mongodb-0.stackstorm-mongodb-headless.stackstorm.svc.cluster.local:27017 in the config
2022-02-25T20:55:25.745+0000 I REPL [replexec-0] transition to STARTUP2 from STARTUP
2022-02-25T20:55:25.745+0000 I REPL [replexec-0] Starting replication storage threads
2022-02-25T20:55:25.746+0000 I REPL [replexec-0] transition to RECOVERING from STARTUP2
2022-02-25T20:55:25.746+0000 I REPL [replexec-0] Starting replication fetcher thread
2022-02-25T20:55:25.746+0000 I REPL [replexec-0] Starting replication applier thread
2022-02-25T20:55:25.747+0000 I REPL [replexec-0] Starting replication reporter thread
2022-02-25T20:55:25.748+0000 I REPL [rsSync-0] Starting oplog application
2022-02-25T20:55:25.750+0000 I NETWORK [LogicalSessionCacheRefresh] Starting new replica set monitor for rs0/stackstorm-mongodb-0.stackstorm-mongodb-headless.stackstorm.svc.cluster.local:27017
2022-02-25T20:55:25.753+0000 I REPL [rsSync-0] transition to SECONDARY from RECOVERING
2022-02-25T20:55:25.753+0000 I REPL [rsSync-0] conducting a dry run election to see if we could be elected. current term: 1
2022-02-25T20:55:25.753+0000 I REPL [replexec-0] dry election run succeeded, running for election in term 2
2022-02-25T20:55:25.756+0000 I ASIO [ReplicaSetMonitor-TaskExecutor] Connecting to stackstorm-mongodb-0.stackstorm-mongodb-headless.stackstorm.svc.cluster.local:27017
2022-02-25T20:55:25.757+0000 I REPL [replexec-0] election succeeded, assuming primary role in term 2
2022-02-25T20:55:25.757+0000 I REPL [replexec-0] transition to PRIMARY from SECONDARY
2022-02-25T20:55:25.757+0000 I REPL [replexec-0] Resetting sync source to empty, which was :27017
2022-02-25T20:55:25.757+0000 I REPL [replexec-0] Entering primary catch-up mode.
2022-02-25T20:55:25.757+0000 I REPL [replexec-0] Exited primary catch-up mode.
2022-02-25T20:55:25.757+0000 I REPL [replexec-0] Stopping replication producer
2022-02-25T20:55:25.757+0000 I NETWORK [initandlisten] waiting for connections on port 27017
2022-02-25T20:55:25.761+0000 I REPL [ReplBatcher] Oplog buffer has been drained in term 2
2022-02-25T20:55:25.762+0000 I NETWORK [listener] connection accepted from 10.1.24.149:50682 #2 (1 connection now open)
2022-02-25T20:55:25.764+0000 I REPL [rsSync-0] transition to primary complete; database writes are now permitted
2022-02-25T20:55:25.765+0000 I NETWORK [conn2] received client metadata from 10.1.24.149:50682 conn2: { driver: { name: "NetworkInterfaceTL", version: "4.0.27" }, os: { type: "Linux", name: "PRETTY_NAME="Debian GNU/Linux 9 (stretch)"", architecture: "x86_64", version: "Kernel 4.15.0-36-generic" } }
2022-02-25T20:55:25.802+0000 I ACCESS [conn2] Successfully authenticated as principal __system on local from client 10.1.24.149:50682
2022-02-25T20:55:25.803+0000 I NETWORK [LogicalSessionCacheRefresh] stackstorm-mongodb-0.stackstorm-mongodb-headless.stackstorm.svc.cluster.local:27017 detected as new replica set primary for rs0; Old primary was :27017
2022-02-25T20:55:25.804+0000 I NETWORK [listener] connection accepted from 10.1.24.149:50684 #4 (2 connections now open)
2022-02-25T20:55:25.805+0000 I NETWORK [listener] connection accepted from 10.1.24.149:50686 #6 (3 connections now open)
2022-02-25T20:55:25.805+0000 I NETWORK [conn4] received client metadata from 10.1.24.149:50684 conn4: { driver: { name: "MongoDB Internal Client", version: "4.0.27" }, os: { type: "Linux", name: "PRETTY_NAME="Debian GNU/Linux 9 (stretch)"", architecture: "x86_64", version: "Kernel 4.15.0-36-generic" } }
2022-02-25T20:55:25.805+0000 I NETWORK [LogicalSessionCacheRefresh] Successfully connected to stackstorm-mongodb-0.stackstorm-mongodb-headless.stackstorm.svc.cluster.local:27017 (1 connections now open to stackstorm-mongodb-0.stackstorm-mongodb-headless.stackstorm.svc.cluster.local:27017 with a 0 second timeout)
2022-02-25T20:55:25.806+0000 I NETWORK [conn6] received client metadata from 10.1.24.149:50686 conn6: { driver: { name: "MongoDB Internal Client", version: "4.0.27" }, os: { type: "Linux", name: "PRETTY_NAME="Debian GNU/Linux 9 (stretch)"", architecture: "x86_64", version: "Kernel 4.15.0-36-generic" } }
2022-02-25T20:55:25.807+0000 I NETWORK [LogicalSessionCacheReap] Successfully connected to stackstorm-mongodb-0.stackstorm-mongodb-headless.stackstorm.svc.cluster.local:27017 (2 connections now open to stackstorm-mongodb-0.stackstorm-mongodb-headless.stackstorm.svc.cluster.local:27017 with a 0 second timeout)
2022-02-25T20:55:25.807+0000 I ACCESS [conn4] Successfully authenticated as principal __system on local from client 10.1.24.149:50684
2022-02-25T20:55:25.808+0000 I ACCESS [conn6] Successfully authenticated as principal __system on local from client 10.1.24.149:50686
2022-02-25T20:55:29.882+0000 I NETWORK [listener] connection accepted from 127.0.0.1:38994 #7 (4 connections now open)
2022-02-25T20:55:29.887+0000 I NETWORK [conn7] received client metadata from 127.0.0.1:38994 conn7: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.27" }, os: { type: "Linux", name: "PRETTY_NAME="Debian GNU/Linux 9 (stretch)"", architecture: "x86_64", version: "Kernel 4.15.0-36-generic" } }
2022-02-25T20:55:29.902+0000 I NETWORK [conn7] end connection 127.0.0.1:38994 (3 connections now open)
RabbitMQ:
21:01:12.75
21:01:12.75 Welcome to the Bitnami rabbitmq container
21:01:12.76 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-rabbitmq
21:01:12.76 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-rabbitmq/issues
21:01:12.77
21:01:12.77 INFO ==> ** Starting RabbitMQ setup **
21:01:12.81 INFO ==> Validating settings in RABBITMQ_* env vars..
21:01:12.83 INFO ==> Initializing RabbitMQ...
21:01:12.90 INFO ==> Persisted data detected. Restoring...
21:01:12.91 WARN ==> Forcing node to start...
21:01:15.40 INFO ==> ** RabbitMQ setup finished! **
21:01:15.44 INFO ==> ** Starting RabbitMQ **
Configuring logger redirection
2022-02-25 21:01:22.504 [debug] <0.284.0> Lager installed handler error_logger_lager_h into error_logger
2022-02-25 21:01:22.515 [debug] <0.287.0> Lager installed handler lager_forwarder_backend into error_logger_lager_event
2022-02-25 21:01:22.515 [debug] <0.290.0> Lager installed handler lager_forwarder_backend into rabbit_log_lager_event
2022-02-25 21:01:22.516 [debug] <0.293.0> Lager installed handler lager_forwarder_backend into rabbit_log_channel_lager_event
2022-02-25 21:01:22.516 [debug] <0.296.0> Lager installed handler lager_forwarder_backend into rabbit_log_connection_lager_event
2022-02-25 21:01:22.516 [debug] <0.299.0> Lager installed handler lager_forwarder_backend into rabbit_log_feature_flags_lager_event
2022-02-25 21:01:22.516 [debug] <0.302.0> Lager installed handler lager_forwarder_backend into rabbit_log_federation_lager_event
2022-02-25 21:01:22.516 [debug] <0.305.0> Lager installed handler lager_forwarder_backend into rabbit_log_ldap_lager_event
2022-02-25 21:01:22.516 [debug] <0.317.0> Lager installed handler lager_forwarder_backend into rabbit_log_ra_lager_event
2022-02-25 21:01:22.516 [debug] <0.308.0> Lager installed handler lager_forwarder_backend into rabbit_log_mirroring_lager_event
2022-02-25 21:01:22.516 [debug] <0.311.0> Lager installed handler lager_forwarder_backend into rabbit_log_prelaunch_lager_event
2022-02-25 21:01:22.516 [debug] <0.314.0> Lager installed handler lager_forwarder_backend into rabbit_log_queue_lager_event
2022-02-25 21:01:22.516 [debug] <0.320.0> Lager installed handler lager_forwarder_backend into rabbit_log_shovel_lager_event
2022-02-25 21:01:22.516 [debug] <0.323.0> Lager installed handler lager_forwarder_backend into rabbit_log_upgrade_lager_event
2022-02-25 21:01:22.553 [info] <0.44.0> Application lager started on node 'rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:22.926 [info] <0.44.0> Application mnesia started on node 'rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:22.927 [info] <0.269.0>
Starting RabbitMQ 3.8.9 on Erlang 22.3
Copyright (c) 2007-2020 VMware, Inc. or its affiliates.
Licensed under the MPL 2.0. Website: https://rabbitmq.com
## ## RabbitMQ 3.8.9
## ##
########## Copyright (c) 2007-2020 VMware, Inc. or its affiliates.
###### ##
########## Licensed under the MPL 2.0. Website: https://rabbitmq.com
Doc guides: https://rabbitmq.com/documentation.html
Support: https://rabbitmq.com/contact.html
Tutorials: https://rabbitmq.com/getstarted.html
Monitoring: https://rabbitmq.com/monitoring.html
Logs: <stdout>
Config file(s): /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.conf
Starting broker...2022-02-25 21:01:22.929 [info] <0.269.0>
node : rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local
home dir : /opt/bitnami/rabbitmq/.rabbitmq
config file(s) : /opt/bitnami/rabbitmq/etc/rabbitmq/rabbitmq.conf
cookie hash : OSQ7aWhTYVAQ+HDqyBFf4w==
log(s) : <stdout>
database dir : /bitnami/rabbitmq/mnesia/rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local
2022-02-25 21:01:23.004 [debug] <0.280.0> Lager installed handler lager_backend_throttle into lager_event
2022-02-25 21:01:27.351 [info] <0.269.0> Running boot step pre_boot defined by app rabbit
2022-02-25 21:01:27.351 [info] <0.269.0> Running boot step rabbit_core_metrics defined by app rabbit
2022-02-25 21:01:27.352 [info] <0.269.0> Running boot step rabbit_alarm defined by app rabbit
2022-02-25 21:01:27.364 [info] <0.349.0> Memory high watermark set to 6379 MiB (6689149747 bytes) of 15948 MiB (16722874368 bytes) total
2022-02-25 21:01:27.375 [info] <0.351.0> Enabling free disk space monitoring
2022-02-25 21:01:27.375 [info] <0.351.0> Disk free limit set to 50MB
2022-02-25 21:01:27.384 [info] <0.269.0> Running boot step code_server_cache defined by app rabbit
2022-02-25 21:01:27.385 [info] <0.269.0> Running boot step file_handle_cache defined by app rabbit
2022-02-25 21:01:27.386 [info] <0.354.0> Limiting to approx 65439 file handles (58893 sockets)
2022-02-25 21:01:27.387 [info] <0.355.0> FHC read buffering: OFF
2022-02-25 21:01:27.387 [info] <0.355.0> FHC write buffering: ON
2022-02-25 21:01:27.389 [info] <0.269.0> Running boot step worker_pool defined by app rabbit
2022-02-25 21:01:27.389 [info] <0.342.0> Will use 4 processes for default worker pool
2022-02-25 21:01:27.389 [info] <0.342.0> Starting worker pool 'worker_pool' with 4 processes in it
2022-02-25 21:01:27.391 [info] <0.269.0> Running boot step database defined by app rabbit
2022-02-25 21:01:27.400 [info] <0.44.0> Application mnesia exited with reason: stopped
2022-02-25 21:01:27.400 [info] <0.44.0> Application mnesia exited with reason: stopped
2022-02-25 21:01:27.440 [info] <0.44.0> Application mnesia started on node 'rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:27.628 [info] <0.269.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2022-02-25 21:01:27.628 [info] <0.269.0> Successfully synced tables from a peer
2022-02-25 21:01:27.666 [info] <0.269.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2022-02-25 21:01:27.667 [info] <0.269.0> Successfully synced tables from a peer
2022-02-25 21:01:27.667 [info] <0.269.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2022-02-25 21:01:27.667 [info] <0.269.0> Successfully synced tables from a peer
2022-02-25 21:01:27.717 [info] <0.269.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2022-02-25 21:01:27.717 [info] <0.269.0> Successfully synced tables from a peer
2022-02-25 21:01:27.749 [info] <0.269.0> Will register with peer discovery backend rabbit_peer_discovery_k8s
2022-02-25 21:01:35.763 [info] <0.269.0> Running boot step database_sync defined by app rabbit
2022-02-25 21:01:35.763 [info] <0.269.0> Running boot step feature_flags defined by app rabbit
2022-02-25 21:01:35.764 [info] <0.269.0> Running boot step codec_correctness_check defined by app rabbit
2022-02-25 21:01:35.764 [info] <0.269.0> Running boot step external_infrastructure defined by app rabbit
2022-02-25 21:01:35.764 [info] <0.269.0> Running boot step rabbit_registry defined by app rabbit
2022-02-25 21:01:35.765 [info] <0.269.0> Running boot step rabbit_auth_mechanism_cr_demo defined by app rabbit
2022-02-25 21:01:35.765 [info] <0.269.0> Running boot step rabbit_queue_location_random defined by app rabbit
2022-02-25 21:01:35.766 [info] <0.269.0> Running boot step rabbit_event defined by app rabbit
2022-02-25 21:01:35.766 [info] <0.269.0> Running boot step rabbit_auth_mechanism_amqplain defined by app rabbit
2022-02-25 21:01:35.766 [info] <0.269.0> Running boot step rabbit_auth_mechanism_plain defined by app rabbit
2022-02-25 21:01:35.767 [info] <0.269.0> Running boot step rabbit_exchange_type_direct defined by app rabbit
2022-02-25 21:01:35.767 [info] <0.269.0> Running boot step rabbit_exchange_type_fanout defined by app rabbit
2022-02-25 21:01:35.767 [info] <0.269.0> Running boot step rabbit_exchange_type_headers defined by app rabbit
2022-02-25 21:01:35.768 [info] <0.269.0> Running boot step rabbit_exchange_type_topic defined by app rabbit
2022-02-25 21:01:35.768 [info] <0.269.0> Running boot step rabbit_mirror_queue_mode_all defined by app rabbit
2022-02-25 21:01:35.768 [info] <0.269.0> Running boot step rabbit_mirror_queue_mode_exactly defined by app rabbit
2022-02-25 21:01:35.768 [info] <0.269.0> Running boot step rabbit_mirror_queue_mode_nodes defined by app rabbit
2022-02-25 21:01:35.769 [info] <0.269.0> Running boot step rabbit_priority_queue defined by app rabbit
2022-02-25 21:01:35.769 [info] <0.269.0> Priority queues enabled, real BQ is rabbit_variable_queue
2022-02-25 21:01:35.769 [info] <0.269.0> Running boot step rabbit_queue_location_client_local defined by app rabbit
2022-02-25 21:01:35.769 [info] <0.269.0> Running boot step rabbit_queue_location_min_masters defined by app rabbit
2022-02-25 21:01:35.769 [info] <0.269.0> Running boot step kernel_ready defined by app rabbit
2022-02-25 21:01:35.770 [info] <0.269.0> Running boot step rabbit_sysmon_minder defined by app rabbit
2022-02-25 21:01:35.770 [info] <0.269.0> Running boot step rabbit_epmd_monitor defined by app rabbit
2022-02-25 21:01:35.773 [info] <0.525.0> epmd monitor knows us, inter-node communication (distribution) port: 25672
2022-02-25 21:01:35.774 [info] <0.269.0> Running boot step guid_generator defined by app rabbit
2022-02-25 21:01:35.780 [info] <0.269.0> Running boot step rabbit_node_monitor defined by app rabbit
2022-02-25 21:01:35.780 [info] <0.529.0> Starting rabbit_node_monitor
2022-02-25 21:01:35.781 [info] <0.269.0> Running boot step delegate_sup defined by app rabbit
2022-02-25 21:01:35.784 [info] <0.269.0> Running boot step rabbit_memory_monitor defined by app rabbit
2022-02-25 21:01:35.785 [info] <0.269.0> Running boot step core_initialized defined by app rabbit
2022-02-25 21:01:35.785 [info] <0.269.0> Running boot step upgrade_queues defined by app rabbit
2022-02-25 21:01:35.837 [info] <0.269.0> message_store upgrades: 1 to apply
2022-02-25 21:01:35.838 [info] <0.269.0> message_store upgrades: Applying rabbit_variable_queue:move_messages_to_vhost_store
2022-02-25 21:01:35.838 [info] <0.269.0> message_store upgrades: No durable queues found. Skipping message store migration
2022-02-25 21:01:35.838 [info] <0.269.0> message_store upgrades: Removing the old message store data
2022-02-25 21:01:35.844 [info] <0.269.0> message_store upgrades: All upgrades applied successfully
2022-02-25 21:01:35.878 [info] <0.269.0> Running boot step rabbit_connection_tracking defined by app rabbit
2022-02-25 21:01:35.879 [info] <0.269.0> Running boot step rabbit_connection_tracking_handler defined by app rabbit
2022-02-25 21:01:35.879 [info] <0.269.0> Running boot step rabbit_exchange_parameters defined by app rabbit
2022-02-25 21:01:35.880 [info] <0.269.0> Running boot step rabbit_mirror_queue_misc defined by app rabbit
2022-02-25 21:01:35.881 [info] <0.269.0> Running boot step rabbit_policies defined by app rabbit
2022-02-25 21:01:35.883 [info] <0.269.0> Running boot step rabbit_policy defined by app rabbit
2022-02-25 21:01:35.883 [info] <0.269.0> Running boot step rabbit_queue_location_validator defined by app rabbit
2022-02-25 21:01:35.883 [info] <0.269.0> Running boot step rabbit_quorum_memory_manager defined by app rabbit
2022-02-25 21:01:35.884 [info] <0.269.0> Running boot step rabbit_vhost_limit defined by app rabbit
2022-02-25 21:01:35.884 [info] <0.269.0> Running boot step recovery defined by app rabbit
2022-02-25 21:01:35.886 [info] <0.269.0> Running boot step empty_db_check defined by app rabbit
2022-02-25 21:01:35.886 [info] <0.269.0> Will not seed default virtual host and user: have definitions to load...
2022-02-25 21:01:35.887 [info] <0.269.0> Running boot step rabbit_looking_glass defined by app rabbit
2022-02-25 21:01:35.887 [info] <0.269.0> Running boot step rabbit_core_metrics_gc defined by app rabbit
2022-02-25 21:01:35.887 [info] <0.269.0> Running boot step background_gc defined by app rabbit
2022-02-25 21:01:35.888 [info] <0.269.0> Running boot step connection_tracking defined by app rabbit
2022-02-25 21:01:35.896 [info] <0.269.0> Setting up a table for connection tracking on this node: 'tracked_connection_on_node_rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:35.902 [info] <0.269.0> Setting up a table for per-vhost connection counting on this node: 'tracked_connection_per_vhost_on_node_rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:35.902 [info] <0.269.0> Running boot step routing_ready defined by app rabbit
2022-02-25 21:01:35.903 [info] <0.269.0> Running boot step pre_flight defined by app rabbit
2022-02-25 21:01:35.903 [info] <0.269.0> Running boot step notify_cluster defined by app rabbit
2022-02-25 21:01:35.903 [info] <0.269.0> Running boot step networking defined by app rabbit
2022-02-25 21:01:35.903 [info] <0.269.0> Running boot step definition_import_worker_pool defined by app rabbit
2022-02-25 21:01:35.903 [info] <0.342.0> Starting worker pool 'definition_import_pool' with 4 processes in it
2022-02-25 21:01:35.905 [info] <0.269.0> Running boot step cluster_name defined by app rabbit
2022-02-25 21:01:35.906 [info] <0.269.0> Initialising internal cluster ID to 'rabbitmq-cluster-id-oTfAFSiNxyzWJOYWvavGkw'
2022-02-25 21:01:35.909 [info] <0.269.0> Running boot step direct_client defined by app rabbit
2022-02-25 21:01:35.910 [info] <0.44.0> Application rabbit started on node 'rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:37.138 [info] <0.576.0> Feature flags: list of feature flags found:
2022-02-25 21:01:37.138 [info] <0.576.0> Feature flags: [ ] drop_unroutable_metric
2022-02-25 21:01:37.138 [info] <0.576.0> Feature flags: [ ] empty_basic_get_metric
2022-02-25 21:01:37.138 [info] <0.576.0> Feature flags: [ ] implicit_default_bindings
2022-02-25 21:01:37.138 [info] <0.576.0> Feature flags: [ ] maintenance_mode_status
2022-02-25 21:01:37.139 [info] <0.576.0> Feature flags: [ ] quorum_queue
2022-02-25 21:01:37.139 [info] <0.576.0> Feature flags: [ ] virtual_host_metadata
2022-02-25 21:01:37.139 [info] <0.576.0> Feature flags: feature flag states written to disk: yes
2022-02-25 21:01:37.624 [info] <0.576.0> Running boot step rabbit_mgmt_db_handler defined by app rabbitmq_management_agent
2022-02-25 21:01:37.624 [info] <0.576.0> Management plugin: using rates mode 'basic'
2022-02-25 21:01:37.680 [info] <0.44.0> Application rabbitmq_management_agent started on node 'rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:37.712 [info] <0.44.0> Application cowlib started on node 'rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:37.752 [info] <0.44.0> Application cowboy started on node 'rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:37.782 [info] <0.44.0> Application rabbitmq_web_dispatch started on node 'rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:37.814 [info] <0.44.0> Application amqp_client started on node 'rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:37.844 [info] <0.576.0> Running boot step rabbit_mgmt_reset_handler defined by app rabbitmq_management
2022-02-25 21:01:37.850 [info] <0.576.0> Running boot step rabbit_management_load_definitions defined by app rabbitmq_management
2022-02-25 21:01:37.945 [info] <0.645.0> Management plugin: HTTP (non-TLS) listener started on port 15672
2022-02-25 21:01:37.946 [info] <0.751.0> Statistics database started.
2022-02-25 21:01:37.946 [info] <0.750.0> Starting worker pool 'management_worker_pool' with 3 processes in it
2022-02-25 21:01:37.949 [info] <0.44.0> Application rabbitmq_management started on node 'rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:37.980 [info] <0.576.0> Running boot step ldap_pool defined by app rabbitmq_auth_backend_ldap
2022-02-25 21:01:37.980 [info] <0.342.0> Starting worker pool 'ldap_pool' with 64 processes in it
2022-02-25 21:01:37.991 [info] <0.44.0> Application eldap started on node 'rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:37.992 [warning] <0.827.0> LDAP plugin loaded, but rabbit_auth_backend_ldap is not in the list of auth_backends. LDAP auth will not work.
2022-02-25 21:01:37.992 [info] <0.44.0> Application rabbitmq_auth_backend_ldap started on node 'rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:38.025 [info] <0.833.0> Peer discovery: enabling node cleanup (will only log warnings). Check interval: 10 seconds.
2022-02-25 21:01:38.025 [info] <0.44.0> Application rabbitmq_peer_discovery_common started on node 'rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:38.065 [info] <0.44.0> Application rabbitmq_peer_discovery_k8s started on node 'rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local'
2022-02-25 21:01:38.066 [info] <0.576.0> Applying definitions from file at '/app/rabbitmq-definitions.json'
2022-02-25 21:01:38.066 [info] <0.576.0> Asked to import definitions. Acting user: rmq-internal
2022-02-25 21:01:38.066 [info] <0.576.0> Importing concurrently 1 users...
2022-02-25 21:01:38.073 [info] <0.570.0> Created user 'admin'
2022-02-25 21:01:38.081 [info] <0.570.0> Successfully set user tags for user 'admin' to [administrator]
2022-02-25 21:01:38.081 [info] <0.576.0> Importing concurrently 1 vhosts...
2022-02-25 21:01:38.082 [info] <0.570.0> Adding vhost '/' without a description
2022-02-25 21:01:38.158 [info] <0.843.0> Making sure data directory '/bitnami/rabbitmq/mnesia/rabbit@stackstorm-rabbitmq-1.stackstorm-rabbitmq-headless.stackstorm.svc.cluster.local/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L' for vhost '/' exists
2022-02-25 21:01:38.167 [info] <0.843.0> Starting message stores for vhost '/'
2022-02-25 21:01:38.167 [info] <0.847.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_transient": using rabbit_msg_store_ets_index to provide index
2022-02-25 21:01:38.170 [info] <0.843.0> Started message store of type transient for vhost '/'
2022-02-25 21:01:38.170 [info] <0.851.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": using rabbit_msg_store_ets_index to provide index
2022-02-25 21:01:38.172 [warning] <0.851.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": rebuilding indices from scratch
2022-02-25 21:01:38.174 [info] <0.843.0> Started message store of type persistent for vhost '/'
2022-02-25 21:01:38.179 [info] <0.576.0> Importing concurrently 1 permissions...
2022-02-25 21:01:38.182 [info] <0.570.0> Successfully set permissions for 'admin' in virtual host '/' to '.*', '.*', '.*'
2022-02-25 21:01:38.183 [info] <0.576.0> Importing sequentially 1 policies...
2022-02-25 21:01:38.199 [info] <0.576.0> Ready to start client connection listeners
2022-02-25 21:01:38.205 [info] <0.886.0> started TCP listener on [::]:5672
2022-02-25 21:01:38.827 [info] <0.576.0> Server startup complete; 6 plugins started.
* rabbitmq_peer_discovery_k8s
* rabbitmq_peer_discovery_common
* rabbitmq_auth_backend_ldap
* rabbitmq_management
* rabbitmq_web_dispatch
* rabbitmq_management_agent
completed with 6 plugins.
2022-02-25 21:01:38.828 [info] <0.576.0> Resetting node maintenance status
Redis:
I am master
redis 20:55:54.61 INFO ==> ** Starting Redis **
1:C 25 Feb 2022 20:55:54.639 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 25 Feb 2022 20:55:54.639 # Redis version=6.0.9, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 25 Feb 2022 20:55:54.640 # Configuration loaded
1:M 25 Feb 2022 20:55:54.642 * Running mode=standalone, port=6379.
1:M 25 Feb 2022 20:55:54.642 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 25 Feb 2022 20:55:54.642 # Server initialized
1:M 25 Feb 2022 20:55:54.642 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
1:M 25 Feb 2022 20:55:54.643 * Ready to accept connections
Yet the waiting-for-queue container looks like this:
Waiting for RabbitMQ Connection...
Waiting for RabbitMQ Connection...
Waiting for RabbitMQ Connection...
Waiting for RabbitMQ Connection...
Waiting for RabbitMQ Connection...
Waiting for RabbitMQ Connection...
Waiting for RabbitMQ Connection...
Waiting for RabbitMQ Connection...
Waiting for RabbitMQ Connection...
Waiting for RabbitMQ Connection...
Waiting for RabbitMQ Connection...
Waiting for RabbitMQ Connection...
Waiting for RabbitMQ Connection...
The container runs this weird command to determine if RabbitMQ is active:
sh -c "until nc -z -w 2 stackstorm-rabbitmq 5672 && echo rabbitmq ok; do echo 'Waiting for RabbitMQ Connection...' sleep 2;done"
I tried changing the stackstorm-rabbitmq to stackstorm-rabbitmq-headless and it didn't work. My guess is some sort of DNS issue, because sometimes the wait-for-db container also gets stuck on waiting for MongoDB connection.
OK. I got some progress. Come to find out, the stackstorm-ha helm chart is just poorly written (no surprise).
So, for those of you who have Microk8s, this information may be helpful.
1) mkdir /opt/bitnami permission denied errors.
a) First, try running the helm chart with this parameter: --set volumePermissions.enabled=true
b) If the above doesn't work and you're using the Microk8s storage, try typing this in terminal:
sudo chmod 777 /var/snap/microk8s/common/default-storage/ -RFV
Why do this? Because Bitnami helm charts always seem to run scripts that lack the permissions to do anything. That's because either the storage it's attempting to work with has root permissions while it has basic user permissions, or the storage has another user's permissions (due to a change they made in their helm chart).
2) After you get that worked out, you may have issues with "sentinel" which is part of the redis pod it installs. The bash script is garbage, and it assumes your Kubernetes setup will load multiple nodes with sentinel all at once. So, it hangs expecting for the other nodes to initialize. Of course, they won't initialize because they're waiting for the first sentinel instance to complete. So, it just sits there and does nothing. Here is a fix I made for the sentinel bash script. You can find this script in your configmaps under "stackstorm-redis-scripts".
#!/bin/bash
replace_in_file() {
local filename="${1:?filename is required}"
local match_regex="${2:?match regex is required}"
local substitute_regex="${3:?substitute regex is required}"
local posix_regex=${4:-true}
local result
# We should avoid using 'sed in-place' substitutions
# 1) They are not compatible with files mounted from ConfigMap(s)
# 2) We found incompatibility issues with Debian10 and "in-place" substitutions
del=$'\001' # Use a non-printable character as a 'sed' delimiter to avoid issues
if [[ $posix_regex = true ]]; then
result="$(sed -E "s${del}${match_regex}${del}${substitute_regex}${del}g" "$filename")"
else
result="$(sed "s${del}${match_regex}${del}${substitute_regex}${del}g" "$filename")"
fi
echo "$result" > "$filename"
}
sentinel_conf_set() {
local -r key="${1:?missing key}"
local value="${2:-}"
# Sanitize inputs
value="${value//\\/\\\\}"
value="${value//&/\\&}"
value="${value//\?/\\?}"
[[ "$value" = "" ]] && value="\"$value\""
replace_in_file "/opt/bitnami/redis-sentinel/etc/sentinel.conf" "^#*\s*${key} .*" "${key} ${value}" false
}
sentinel_conf_add() {
echo $'\n'"$@" >> "/opt/bitnami/redis-sentinel/etc/sentinel.conf"
}
is_boolean_yes() {
local -r bool="${1:-}"
# comparison is performed without regard to the case of alphabetic characters
shopt -s nocasematch
if [[ "$bool" = 1 || "$bool" =~ ^(yes|true)$ ]]; then
true
else
false
fi
}
host_id() {
echo "$1" | openssl sha1 | awk '{print $2}'
}
HEADLESS_SERVICE="stackstorm-redis-headless.stackstorm.svc.cluster.local"
REDIS_SERVICE="stackstorm-redis.stackstorm.svc.cluster.local"
if [[ -n $REDIS_PASSWORD_FILE ]]; then
echo "Setting Password File..."
password_aux=`cat ${REDIS_PASSWORD_FILE}`
export REDIS_PASSWORD=$password_aux
fi
if [[ ! -f /opt/bitnami/redis-sentinel/etc/sentinel.conf ]]; then
echo "Initializing sentinel configuration..."
cp /opt/bitnami/redis-sentinel/mounted-etc/sentinel.conf /opt/bitnami/redis-sentinel/etc/sentinel.conf
printf "\nsentinel myid %s" "$(host_id "$HOSTNAME")" >> /opt/bitnami/redis-sentinel/etc/sentinel.conf
fi
export REDIS_REPLICATION_MODE="slave"
if [[ -z "$(getent ahosts "$HEADLESS_SERVICE" | grep -v "^$(hostname -i) ")" ]]; then
echo "Setting master and slave modes..."
export REDIS_REPLICATION_MODE="master"
fi
if [[ "$REDIS_REPLICATION_MODE" == "master" ]]; then
REDIS_MASTER_HOST="$(hostname -i)"
REDIS_MASTER_PORT_NUMBER="6379"
else
if is_boolean_yes "$REDIS_SENTINEL_TLS_ENABLED"; then
sentinel_info_command="redis-cli -h $REDIS_SERVICE -p 26379 --tls --cert ${REDIS_SENTINEL_TLS_CERT_FILE} --key ${REDIS_SENTINEL_TLS_KEY_FILE} --cacert ${REDIS_SENTINEL_TLS_CA_FILE} sentinel get-master-addr-by-name mymaster"
else
sentinel_info_command="redis-cli -h $REDIS_SERVICE -p 26379 sentinel get-master-addr-by-name mymaster"
fi
REDIS_SENTINEL_INFO=($($sentinel_info_command))
REDIS_MASTER_HOST=${REDIS_SENTINEL_INFO[0]}
REDIS_MASTER_PORT_NUMBER=${REDIS_SENTINEL_INFO[1]}
# Immediately attempt to connect to the reported master. If it doesn't exist the connection attempt will either hang
# or fail with "port unreachable" and give no data. The liveness check will then timeout waiting for the sentinel
# container to be ready and restart the it. By then the new master will likely have been elected
if is_boolean_yes "$REDIS_SENTINEL_TLS_ENABLED"; then
echo "Communicating with master redis node..."
sentinel_info_command="redis-cli -h $REDIS_MASTER_HOST -p 26379 --tls --cert ${REDIS_SENTINEL_TLS_CERT_FILE} --key ${REDIS_SENTINEL_TLS_KEY_FILE} --cacert ${REDIS_SENTINEL_TLS_CA_FILE} sentinel get-master-addr-by-name mymaster"
else
sentinel_info_command="redis-cli -h $REDIS_MASTER_HOST -p 26379 sentinel get-master-addr-by-name mymaster"
fi
if [[ ! ($($sentinel_info_command)) ]]; then
echo "ERROR: Master doesn't exist!"
# master doesn't actually exist, this probably means the remaining pods haven't elected a new one yet
# and are reporting the old one still. Once this happens the container will get stuck and never see the new
# master. We stop here to allow the container to not pass the liveness check and be restarted.
exit 1
fi
fi
echo "Master node found!"
sentinel_conf_set "sentinel monitor" "mymaster "$REDIS_MASTER_HOST" "$REDIS_MASTER_PORT_NUMBER" 2"
add_replica() {
if [[ "$1" != "$REDIS_MASTER_HOST" ]]; then
sentinel_conf_add "sentinel known-replica mymaster $1 6379"
fi
}
# remove generated known sentinels and replicas
tmp="$(sed -e '/^sentinel known-/d' -e '/^$/d' /opt/bitnami/redis-sentinel/etc/sentinel.conf)"
echo "$tmp" > /opt/bitnami/redis-sentinel/etc/sentinel.conf
echo "Setting up headless service..."
# Best to let this FOR loop run in the background, as not every Kubernetes set up loads 3 or 4 nodes simultaneously!!
{
for node in $(seq 0 3); do
NAME="stackstorm-redis-node-$node"
IP="$(getent hosts "$NAME.$HEADLESS_SERVICE" | awk ' {print $1 }')"
echo "Assigning headless service IP -> $IP..."
if [[ "$NAME" != "$HOSTNAME" && -n "$IP" ]]; then
echo "Adding master node configuration to sentinel config..."
sentinel_conf_add "sentinel known-sentinel mymaster $IP 26379 $(host_id "$NAME")"
echo "Adding replica IP..."
add_replica "$IP"
fi
done
} &
echo "Adding hostname to replica data..."
add_replica "$(hostname -i)"
echo "Executing redis-server..."
exec redis-server /opt/bitnami/redis-sentinel/etc/sentinel.conf --sentinel
@QuintonMcLeod - Are you suggesting the way helm deploys StackStorm-HA causes the issue with redis
or sentinel implementation is faulty in the redis
chart?
@arms11
@QuintonMcLeod - Are you suggesting the way helm deploys StackStorm-HA causes the issue with redis or sentinel implementation is faulty in the redis chart?
I'm suggesting both.
1) The Stackstorm is using Bitnami charts and not it's own chart. This causes a bit of a conflict, as Bitnami has its own issues with their own charts. Couple that with the fact that the StackStorm-HA has issues of its own.
2) The Redis chart assumes for no good reason that 3 or 4 instances of itself will be loading simultaneously, which is not always the case (or ever the case). Therefore, it'll hang waiting for the other instances while they never load, because they're all waiting for the first instance to finish.
Thanks @QuintonMcLeod. Definitely an interesting find. The reason for my ask was because several of us have not faced this and was wondering if there is something different in the way this works in Microk8s
. I will let @cognifloyd and @armab to chime in, but typically 3rd party helm chart related shortcomings are not maintained this way unless this is a general issue faced by multiple k8s
deployments.
Btw, if the script is POC, it's fine. Otherwise, you have hardwired the service name variables with stackstorm
name in the prefix. Since that will depend on ST2 release deployment, it may have to be dynamic {{ .Chart.Release.Name }}
. This is also true for the ports.
@arms11
Thanks @QuintonMcLeod. Definitely an interesting find. The reason for my ask was because several of us have not faced this and was wondering if there is something different in the way this works in
Microk8s
. I will let @cognifloyd and @armab to chime in, but typically 3rd party helm chart related shortcomings are not maintained this way unless this is a general issue faced by multiplek8s
deployments.Btw, if the script is POC, it's fine. Otherwise, you have hardwired the service name variables with
stackstorm
name in the prefix. Since that will depend on ST2 release deployment, it may have to be dynamic{{ .Chart.Release.Name }}
. This is also true for the ports.
Changing the name wasn't anything I did "directly", as the name is totally based on the name you provide during the helm chart install.
As for your experience using other Kubernetes deployments: Microk8s always seems to be the underdog. No one sees issues with it because they don't use it. Kubernetes deployments differ, and unique problems occur.
Anyway, I just ended up using the docker-compose version on a spare server and abandoned the Stackstorm-HA helm chart. It just plain doesn't work on Microk8s.
We've seen in the past issues with K8s OpenShift
related to its security approach, KIND
related to its default settings for the persistent volumes. MicroK8s
is the new one.
The e2e tests in CircleCI are running with minikube
at this moment and things are confirmed to work with the real production K8s clusters.
So yeah, indeed the probability to hit the edge case increases with the different K8s variants.
The chart is supported by the community who's using it and contributing back fixes and improvements.
I'd recommend contributing the fixes to stackstorm-ha
or its upstream helm chart dependencies (MongoDB
, RabbitMQ
, Redis
). It makes more sense for st2 to use those upstreams because the community is bigger there, as well as maintenance, updates, and support.
Hi, can u help me with st2 on microk8s? I have problem
Hi, can u help me with st2 on microk8s? I have problem
I ended up abandoning StackStorm on MicroK8s, because it's not worth the headache. Of course, this was 2 years ago, so maybe they fixed those issues? No idea.
I went after the docker-compose version because it was better maintained, plus it was contained. I could easily delete the folder if I didn't like it anymore.
Hi, can u help me with st2 on microk8s? I have problem
I ended up abandoning StackStorm on MicroK8s, because it's not worth the headache. Of course, this was 2 years ago, so maybe they fixed those issues? No idea.
I went after the docker-compose version because it was better maintained, plus it was contained. I could easily delete the folder if I didn't like it anymore.
Thanks for reply. But now I don't sure that stackstorm it's helpfully tool. I think more easy solve is python script. Stackstorm it's just python blocks of if then
Hi, can u help me with st2 on microk8s? I have problem
I ended up abandoning StackStorm on MicroK8s, because it's not worth the headache. Of course, this was 2 years ago, so maybe they fixed those issues? No idea. I went after the docker-compose version because it was better maintained, plus it was contained. I could easily delete the folder if I didn't like it anymore.
Thanks for reply. But now I don't sure that stackstorm it's helpfully tool. I think more easy solve is python script. Stackstorm it's just python blocks of if then
This is a strange way of looking at things. StackStorm allows you to set up users, set up roles for those users, it allows you to set up sensors, rules, schedules, configuration in the datastore for different processes and other things. If you just need to execute a python script, then for sure ST2 is a bit overkill for the task. However, if you need to have a leveraged environment where different users can do various things in a relatively user friendly way, just raw python doesn't do it.
Hi, can u help me with st2 on microk8s? I have problem
I ended up abandoning StackStorm on MicroK8s, because it's not worth the headache. Of course, this was 2 years ago, so maybe they fixed those issues? No idea.
I went after the docker-compose version because it was better maintained, plus it was contained. I could easily delete the folder if I didn't like it anymore.
Thanks for reply. But now I don't sure that stackstorm it's helpfully tool. I think more easy solve is python script. Stackstorm it's just python blocks of if then
This is a strange way of looking at things. StackStorm allows you to set up users, set up roles for those users, it allows you to set up sensors, rules, schedules, configuration in the datastore for different processes and other things. If you just need to execute a python script, then for sure ST2 is a bit overkill for the task. However, if you need to have a leveraged environment where different users can do various things in a relatively user friendly way, just raw python doesn't do it.
U are right. In future the tool is scalable for the project. I want to use that, but I have only problems with install. Maybe u can help me. https://github.com/StackStorm/st2/discussions/6198
If anyone wants to work on fixing support for micro-k8s, please start with submitting a PR that adds CI testing with micro-k8s. Then you can work on identifying which part(s) of the helm chart are not compatible with micro-k8s to see if fixing that is feasible.
I don't use micro-k8s, and I don't have the bandwidth to investigate anything with it. Volunteers welcome!
Hi, can u help me with st2 on microk8s? I have problem
I ended up abandoning StackStorm on MicroK8s, because it's not worth the headache. Of course, this was 2 years ago, so maybe they fixed those issues? No idea.
I went after the docker-compose version because it was better maintained, plus it was contained. I could easily delete the folder if I didn't like it anymore.
Maybe u can help me to get messages from queue rabbitmq with st2? Because I have st2 on docker compose and I have rabbitmq on my desktop. And I want to get new messages from queue. I install plugin rabbitmq, I can publish message in the queue. I use rule if rabbitmq.new_message. St2 connected to with rabbitmq but in 1 minute, connection is disappeared. And st2 doesn't see new messages. What I do wrong?
If anyone wants to work on fixing support for micro-k8s, please start with submitting a PR that adds CI testing with micro-k8s. Then you can work on identifying which part(s) of the helm chart are not compatible with micro-k8s to see if fixing that is feasible.
I don't use micro-k8s, and I don't have the bandwidth to investigate anything with it. Volunteers welcome!
Hi. Thanks for your interesting my question. Maybe u can help me? I write problem with st2-rabbitmq in previous message.
Hi, can u help me with st2 on microk8s? I have problem
I ended up abandoning StackStorm on MicroK8s, because it's not worth the headache. Of course, this was 2 years ago, so maybe they fixed those issues? No idea. I went after the docker-compose version because it was better maintained, plus it was contained. I could easily delete the folder if I didn't like it anymore.
Maybe u can help me to get messages from queue rabbitmq with st2? Because I have st2 on docker compose and I have rabbitmq on my desktop. And I want to get new messages from queue. I install plugin rabbitmq, I can publish message in the queue. I use rule if rabbitmq.new_message. St2 connected to with rabbitmq but in 1 minute, connection is disappeared. And st2 doesn't see new messages. What I do wrong?
If you want to use your locally installed Rabbit-MQ install, you'll need to allow StackStorm (or whatever pods you're running) see outside of its cluster. There's several different ways to do this. The way I do it is by letting the deployment use my host's networking.
If you're using Docker-Compose, I don't recommend you alter it to use your Rabbit-MQ, because the Docker-Compose instance is designed in such a way to be used and then thrown away like used toilet paper.
If anyone wants to work on fixing support for micro-k8s, please start with submitting a PR that adds CI testing with micro-k8s. Then you can work on identifying which part(s) of the helm chart are not compatible with micro-k8s to see if fixing that is feasible.
I don't use micro-k8s, and I don't have the bandwidth to investigate anything with it. Volunteers welcome!
How about NOT using Bitnami as your template? Bitnami always has permission issues in their releases (especially within Microk8s), because their releases always want full unmolested access to the storage. Microk8s HATES this - as it should, because from a security standpoint, a deployment has no business having the type of permission Bitnami always demands for.
When I worked at University on our Kubernetes cluster, we ALWAYS avoided Bitnami releases for this very reason!
Several pods failed to start when the server was restarted.
view pod the error log :
How can I solve this problem?????