ansible-middleware / amq_streams

Apache License 2.0
9 stars 7 forks source link

Zookeeper ids are not asigned in the zookeeper ensemble list - Zookeeper does not start #100

Closed rmarting closed 11 months ago

rmarting commented 1 year ago
SUMMARY

The new feature of identify the zookeeper id works if you use a sequence of ids starting from 0, but if you use a difference sequence, then zookeeper does not start and fails with the following exception:

[2023-10-12 17:51:33,862] ERROR Unexpected exception, exiting abnormally (org.apache.zookeeper.server.quorum.QuorumPeerMain)
java.lang.RuntimeException: My id 30 not in the peer list
    at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1077)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
[2023-10-12 17:51:33,868] INFO ZooKeeper audit is disabled. (org.apache.zookeeper.audit.ZKAuditProvider)

This is because the list of servers of the zookeeper cluster has not the same id (based in the zookeeper group index). This is the list created, not linked with the id declared in the myid file:

# List of servers which should be members of the Zookeeper cluster.
server.1=f38mw01:2888:3888:participant;f38mw01:2181
server.2=f38mw02:2888:3888:participant;f38mw02:2181
server.3=f38mw03:2888:3888:participant;f38mw03:2181

This also affects authorization provider if authentication is enabled:

# Client-to-Server Authentication
requireClientAuthScheme=sasl
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
authProvider.2=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
authProvider.3=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
ISSUE TYPE
ANSIBLE VERSION
ansible [core 2.14.10]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/rmarting/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.11/site-packages/ansible
  ansible collection location = /home/rmarting/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.11.5 (main, Aug 28 2023, 00:00:00) [GCC 13.2.1 20230728 (Red Hat 13.2.1-1)] (/usr/bin/python3)
  jinja version = 3.0.3
  libyaml = True
COLLECTION VERSION

Current content of the main branch

STEPS TO REPRODUCE

The following inventory will fails with this issue as the ids are not declared sequentialy

zookeepers:
  hosts:
    f38mw01:
      amq_streams_zookeeper_zookeeper_id: 10
    f38mw02:
      amq_streams_zookeeper_zookeeper_id: 20
    f38mw03:
      amq_streams_zookeeper_zookeeper_id: 30

This can happen with any playbook installing multiple zookeepers.

EXPECTED RESULTS

Use the zookeeper id per host in the list of servers of the ensemble (including the autorization providers).

ACTUAL RESULTS
TASK [amq_streams_common : Check if service is started] ********************************************************************************************************************************************************************************************************
fatal: [f38mw01]: FAILED! => {"assertion": "ansible_facts.services[\"amq_streams_zookeeper.service\"]['state'] == 'running'", "changed": false, "evaluated_to": false, "msg": "Service is not started."}
fatal: [f38mw02]: FAILED! => {"assertion": "ansible_facts.services[\"amq_streams_zookeeper.service\"]['state'] == 'running'", "changed": false, "evaluated_to": false, "msg": "Service is not started."}
fatal: [f38mw03]: FAILED! => {"assertion": "ansible_facts.services[\"amq_streams_zookeeper.service\"]['state'] == 'running'", "changed": false, "evaluated_to": false, "msg": "Service is not started."}
rmarting commented 1 year ago

@gbaufake Could you double check it? I think it is needed to review the implementation of the zookeeper id to avoid this kind of issue.

It was something that we missed to review in the #99 implementation. :see_no_evil:

gbaufake commented 1 year ago

We need to update the following:

List of servers which should be members of the Zookeeper cluster.
server.1=f38mw01:2888:3888:participant;f38mw01:2181
server.2=f38mw02:2888:3888:participant;f38mw02:2181
server.3=f38mw03:2888:3888:participant;f38mw03:2181

to match the ids when defined.

These lines need to be modified.

https://github.com/ansible-middleware/amq_streams/blob/4dd4f179f074c188e981b740a1f82381ef6911bc/roles/amq_streams_zookeeper/templates/zookeeper.properties.j2#L47

https://github.com/ansible-middleware/amq_streams/blob/4dd4f179f074c188e981b740a1f82381ef6911bc/roles/amq_streams_zookeeper/templates/zookeeper.properties.j2#L63