Closed RobertFloor closed 2 months ago
Hello Robert; thanks for reporting. It will be made available for both shared store and replication
Please also review the systemctl template https://github.com/ansible-middleware/amq/blob/main/roles/activemq/templates/amq_broker.service.j2 for the scenario that you restart the master broker but it does not become active.
Sure, that will be addressed in a different PR.
Hi So this works but the systemd does not recognize the broker as correctly started since AMQ221001 is not reached if the master does not become active
I need to write the tests and also have many changes open, can you wait for 2.2.0 to be released? Currently the HEAD is a in a bit of rough state
Yes although we would need this feature relatively soon since we need to upgrade our production broker setup
The current issue occurs if we do a systemctl restart of the master with the flag
activemq_ha_allow_failback: false
Master
sudo systemctl restart amq-broker.service
Job for amq-broker.service failed because the control process exited with error code.
See "systemctl status amq-broker.service" and "journalctl -xe" for details.
Logs
2024-08-26 11:27:43,967 INFO [org.apache.activemq.artemis.integration.bootstrap] AMQ101000: Starting ActiveMQ Artemis Server version 2.28.0.redhat-00012
2024-08-26 11:27:43,991 INFO [org.apache.activemq.artemis.core.server] AMQ221000: live Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=/mnt/asbf-tst-internal/journal,bindingsDirectory=/mnt/asbf-tst-internal/bindings,largeMessagesDirectory=/mnt/asbf-tst-internal/largemessages,pagingDirectory=/mnt/asbf-tst-internal/paging)
2024-08-26 11:27:43,992 INFO [org.apache.activemq.artemis.core.server] AMQ221006: Waiting to obtain live lock
2024-08-26 11:27:44,025 INFO [org.apache.activemq.artemis.core.server] AMQ221013: Using NIO Journal
2024-08-26 11:27:44,065 INFO [org.apache.activemq.artemis.core.server] AMQ221057: Global Max Size is being adjusted to 1/2 of the JVM max size (-Xmx). being defined as 12884901888
2024-08-26 11:27:44,134 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
2024-08-26 11:27:44,135 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP
2024-08-26 11:27:44,136 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ
2024-08-26 11:27:44,136 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT
2024-08-26 11:27:44,137 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE
2024-08-26 11:27:44,137 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP
2024-08-26 11:27:44,283 INFO [org.apache.activemq.artemis.core.server] AMQ221034: Waiting indefinitely to obtain live lock
2024-08-26 11:27:44,700 INFO [org.apache.activemq.artemis.core.server] AMQ221031: backup announced
The Backup logs
2024-08-26 11:22:43,922 INFO [org.apache.activemq.artemis.core.server] AMQ221109: Apache ActiveMQ Artemis Backup Server version 2.28.0.redhat-00012 [562eba20-1741-11ee-a3cf-005056354120] started, waiting live to fail before it gets active
2024-08-26 11:22:44,497 INFO [org.apache.activemq.artemis] AMQ241001: HTTP Server started at https://localhost:8161
2024-08-26 11:22:44,499 INFO [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at https://localhost:8161/console/jolokia
2024-08-26 11:22:44,499 INFO [org.apache.activemq.artemis] AMQ241004: Artemis Console available at https://localhost:8161/console
2024-08-26 11:22:45,926 INFO [org.apache.activemq.artemis.core.server] AMQ221031: backup announced
2024-08-26 11:24:41,689 INFO [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:61616 for protocols [CORE,AMQP,OPENWIRE]
2024-08-26 11:24:41,689 INFO [org.apache.activemq.artemis.core.server] AMQ221010: Backup Server is now live
Restarting the backup via systemctl goes fine
Hello, the logs seem coherent with the allow_failback false; what is your expected behaviour?
So the problem is
sudo systemctl restart amq-broker.service
Job for amq-broker.service failed because the control process exited with error code.
See "systemctl status amq-broker.service" and "journalctl -xe" for details.
Tis causes our pipeline to fail. It fails because
{% elif activemq_ha_enabled and (activemq_ha_role == 'master' or activemq_ha_role == 'primary') %}
ExecStartPost=/usr/bin/timeout {{ activemq_systemd_wait_for_timeout }} sh -c 'tail -n 15 -f {{ activemq.instance_home }}/log/artemis.log | sed "/AMQ221001/ q" && /bin/sleep {{ activemq_systemd_wait_for_delay }}'
{% else %}
But AMQ221001 is not reached when you restart the master but it does not become active
ah right; so as a quick workaround we could make both strings (master and slave) be acceptable as a started condition in the logs: that'd be quick to do and allow me to also allow to configure the string itself
Sorry it seems a little more complex then this. The Master logs when starting as active server
2024-08-26 13:52:32,621 INFO [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:61616 for protocols [CORE,AMQP,OPENWIRE]
2024-08-26 13:52:32,641 INFO [org.apache.activemq.artemis.core.server] AMQ221007: Server is now live
2024-08-26 13:52:32,642 INFO [org.apache.activemq.artemis.core.server] AMQ221001: Apache ActiveMQ Artemis Message Broker version 2.28.0.redhat-00012 [asbf-test-internal, nodeID=562eba20-1741-11ee-a3cf-005056354120]
2024-08-26 13:52:32,647 INFO [org.apache.activemq.artemis] AMQ241003: Starting embedded web server
2024-08-26 13:52:32,886 INFO [org.apache.amq.hawtio.branding.PluginContextListener] Initialized redhat-branding plugin
2024-08-26 13:52:32,924 INFO [org.apache.activemq.hawtio.plugin.PluginContextListener] Initialized artemis-plugin plugin
2024-08-26 13:52:33,760 INFO [org.apache.activemq.artemis] AMQ241001: HTTP Server started at https://localhost:8161
2024-08-26 13:52:33,762 INFO [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at https://localhost:8161/console/jolokia
2024-08-26 13:52:33,762 INFO [org.apache.activemq.artemis] AMQ241004: Artemis Console available at https://localhost:8161/console
So in this case code AMQ221034 does not match, particularly since we have a complex address configuration so there are a lot of log lines and the command only looks for tail -n 15
. I am afraid it is not possible to use one code for both master active and master not-active.
AMQ221034 is for primary-only, if you have activemq_ha_enabled=true, the match string is AMQ221109|AMQ221001
(one or the other)
any luck with this?
I went for AMQ221010 AMQ221031 AMQ221007
wdyt, anything else needed/missing? or I'll add a final feature (broker-plugins) and release 2.2.0
Hi The changes after commit 0457e150b545f3ef229d6d9831b23c09eda0831c break our SSL settings. Somehow the expected name is localhost instead of the real hostname. Our setup works at this commit but not with commits afterwards.
2024-08-27 15:55:35,493 ERROR [org.apache.activemq.artemis.core.client] AMQ214016: Failed to create netty connection
javax.net.ssl.SSLHandshakeException: No subject alternative DNS name matching localhost found.
at sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:?]
at sun.security.ssl.TransportContext.fatal(TransportContext.java:360) ~[?:?]
at sun.security.ssl.TransportContext.fatal(TransportContext.java:303) ~[?:?]
at sun.security.ssl.TransportContext.fatal(TransportContext.java:298) ~[?:?]
at sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1357) ~[?:?]
at sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1232) ~[?:?]
at sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1175) ~[?:?]
at sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392) ~[?:?]
at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443) ~[?:?]
at sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1076) ~[?:?]
at sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1063) ~[?:?]
at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
at sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:1010) ~[?:?]
at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1647) ~[netty-handler-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1493) ~[netty-handler-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1334) ~[netty-handler-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1383) ~[netty-handler-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) ~[netty-codec-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) ~[netty-codec-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) ~[netty-codec-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[netty-transport-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800) ~[netty-transport-classes-epoll-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:509) ~[netty-transport-classes-epoll-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407) ~[netty-transport-classes-epoll-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.100.Final-redhat-00001.jar:4.1.100.Final-redhat-00001]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) ~[artemis-commons-2.28.0.redhat-00012.jar:?]
Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching localhost found.
at sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:212) ~[?:?]
at sun.security.util.HostnameChecker.match(HostnameChecker.java:103) ~[?:?]
at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:461) ~[?:?]
at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:435) ~[?:?]
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:283) ~[?:?]
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:141) ~[?:?]
at sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1335) ~[?:?]
Is it an acceptor or a connector? Can you provide the relevant parameters or the generated broker.xml for acceptors/connectors?
Ideally, I think you are defining your acceptor with bind_address: "{{ activemq_host }}"
(the default); you should define activemq_host to match the hostname in the certificate (or set bind_address in the acceptor directly to the hostname).
Yes in the end it was a problem on our side. However I still wonder why you are selecting localhost as advertised connector. This document states: https://activemq.apache.org/components/artemis/documentation/latest/clusters.
connector-ref
This is the connector which will be sent to other nodes in the cluster so they have the correct cluster topology.
This parameter is mandatory.
So localhost does not make sense for us as it cannot be resolved by the other hosts
I know and agree it makes little sense. It has been the default generated value, in broker.xml, when running artemis create
with a static list of nodes unfortunately; and since then I kept the default in the collection for sake of backwards compatibility. What i can do is document this mess better, promised.
And likely 'sanitize' the situation in 3.0 :)
SUMMARY
We would like to get a shared storage setup where we have one Master and one Slave broker, without failback.
Currently this is not possible in the playbook. It is important for us to get this setup. Could you create this for us in the playbook.
ISSUE TYPE