ansible-middleware / amq

A collection to manage AMQ brokers
Apache License 2.0
17 stars 12 forks source link

master/slave - systemctl restart fails #185

Closed garethahealy closed 1 month ago

garethahealy commented 1 month ago
SUMMARY

So I think I've hit a bug with how the system service is configured.

  1. Create a master/slave setup using shared storage
  2. Stop the slave via systemctl
  3. Restart the master via systemctl - works as expected
  4. Start the slave via systemctl
  5. Restart the master via systemctl - fails to start

Looking in the logs for master, I can see it stopping and then trying to start:

2024-09-27 10:12:51,667 INFO  [org.apache.activemq.artemis] AMQ241005: Stopping embedded web server
2024-09-27 10:12:51,671 INFO  [io.hawt.web.auth.AuthenticationFilter] Destroying hawtio authentication filter
2024-09-27 10:12:51,672 INFO  [io.hawt.HawtioContextListener] Destroying hawtio services
2024-09-27 10:12:51,685 INFO  [org.apache.activemq.hawtio.plugin.PluginContextListener] Destroyed artemis-plugin plugin
2024-09-27 10:12:51,688 INFO  [org.apache.amq.hawtio.branding.PluginContextListener] Destroyed redhat-branding plugin
2024-09-27 10:12:51,702 INFO  [org.apache.activemq.artemis] AMQ241006: Stopped embedded web server
2024-09-27 10:12:51,703 INFO  [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.33.0.redhat-00010 [4ad507d3-7cb3-11ef-975a-0694ffaf8fcb] stopped, uptime 1 minute
2024-09-27 10:12:55,176 INFO  [org.apache.activemq.artemis.integration.bootstrap] AMQ101000: Starting ActiveMQ Artemis Server version 2.33.0.redhat-00010
2024-09-27 10:12:55,234 INFO  [org.apache.activemq.artemis.core.server] AMQ221000: Primary message broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=/opt/amq-data/journal,bindingsDirectory=/opt/amq-data/bindings,largeMessagesDirectory=/opt/amq-data/largemessages,pagingDirectory=/opt/amq-data/paging)
2024-09-27 10:12:55,236 INFO  [org.apache.activemq.artemis.core.server] AMQ221006: Waiting to obtain primary lock
2024-09-27 10:12:55,288 INFO  [org.apache.activemq.artemis.core.server] AMQ221012: Using AIO Journal
2024-09-27 10:12:55,362 INFO  [org.apache.activemq.artemis.core.server] AMQ221057: Global Max Size is being adjusted to 1/2 of the JVM max size (-Xmx). being defined as 1073741824
2024-09-27 10:12:55,422 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
2024-09-27 10:12:55,422 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP
2024-09-27 10:12:55,423 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ
2024-09-27 10:12:55,423 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT
2024-09-27 10:12:55,424 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE
2024-09-27 10:12:55,424 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP
2024-09-27 10:12:55,564 INFO  [org.apache.activemq.artemis.core.server] AMQ221034: Waiting indefinitely to obtain primary lock
2024-09-27 10:12:55,696 INFO  [org.apache.activemq.artemis.core.server] AMQ221031: backup announced

I believe it because of the ExecStartPost

Running those tail commands returns nothing, so systemctl fails.

I think this scenario is valid, being able to restart the master via systemctl so it becomes the slave.

guidograzioli commented 1 month ago

Can you show activemq_systemd_wait_for_port and activemq_systemd_wait_for_log? The master systemd unit at point should indeed wait for "AMQ221031: backup announced" (which is in the logs)

guidograzioli commented 1 month ago

Also, can you try with:

activemq_systemd_wait_for_log_ha_string: 'AMQ221109\\\\|AMQ221001\\\\|AMQ221034'

garethahealy commented 1 month ago

amq_broker_systemd_wait_for_port == False | amq_broker_systemd_wait_for_log == True

/etc/systemd/system/amq1.service

ExecStartPost=/usr/bin/timeout 60 sh -c 'tail -n 15 -f /opt/amq/amq1/log/artemis.log | sed "/AMQ221109\\\\|AMQ221001/ q" && /bin/sleep 10'

/etc/systemd/system/amq2.service

ExecStartPost=/usr/bin/timeout 60 sh -c 'tail -n 15 -f /opt/amq/amq2/log/artemis.log | sed "/AMQ221109\\\\|AMQ221001/ q" && /bin/sleep 10'
garethahealy commented 1 month ago

fixes the issue:

amq_broker_systemd_wait_for_log_ha_string: 'AMQ221109\\\\|AMQ221001\\\\|AMQ221034'
guidograzioli commented 1 month ago

Thanks, I'll update the default!