ansible-middleware / amq_streams

Apache License 2.0
8 stars 6 forks source link

Server Log is not being created until the check is done #83

Closed gbaufake closed 6 months ago

gbaufake commented 1 year ago
SUMMARY

When server_log_validation: true, the collection fails because the server.log is not being created timely when the check is being performed.

ISSUE TYPE
ANSIBLE VERSION
  python version = 3.11.5 (main, Aug 24 2023, 15:09:45) [Clang 14.0.3 (clang-1403.0.22.14.1)] (/opt/homebrew/Cellar/ansible/8.4.0/libexec/bin/python)
  jinja version = 3.1.2
  libyaml = True
COLLECTION VERSION
middleware_automation.amq_streams 0.0.5
STEPS TO REPRODUCE
Install the zookeeper and kafka cluster with server_log_validation: true`
EXPECTED RESULTS
ACTUAL RESULTS
TASK [middleware_automation.amq_streams.amq_streams_common : Check if service is started] ***********************************************************************************************************************************
ok: [hostname2]
ok: [hostname1]
ok: [hostname3]

TASK [Verify that logfile /zookeeper_datastore/logs/server.log and it contains no errors.] **********************************************************************************************************************************

TASK [middleware_automation.amq_streams.amq_streams_common : Ensure required parameter(s) are provided.] ********************************************************************************************************************
fatal: [hostname2]: FAILED! => {"assertion": "server_log_dir is exists", "changed": false, "evaluated_to": false, "msg": "/zookeeper_datastore/logs/server.log is invalid"}
fatal: [hostname1]: FAILED! => {"assertion": "server_log_dir is exists", "changed": false, "evaluated_to": false, "msg": "/zookeeper_datastore/logs/server.log is invalid"}
fatal: [hostname3]: FAILED! => {"assertion": "server_log_dir is exists", "changed": false, "evaluated_to": false, "msg": "/zookeeper_datastore/logs/server.log is invalid"}

TASK [middleware_automation.amq_streams.amq_streams_common : Ensure required parameter(s) are provided.] ********************************************************************************************************************
fatal: [hostname2]: FAILED! => {"assertion": "server_log_dir is exists", "changed": false, "evaluated_to": false, "msg": "/zookeeper_datastore/logs/server.log is invalid"}
fatal: [hostname1]: FAILED! => {"assertion": "server_log_dir is exists", "changed": false, "evaluated_to": false, "msg": "/zookeeper_datastore/logs/server.log is invalid"}
fatal: [hostname3]: FAILED! => {"assertion": "server_log_dir is exists", "changed": false, "evaluated_to": false, "msg": "/zookeeper_datastore/logs/server.log is invalid"}

PLAY RECAP ******************************************************************************************************************************************************************************************************************
hostname1 : ok=40   changed=0    unreachable=0    failed=1    skipped=7    rescued=0    ignored=0
hostname2 : ok=40   changed=0    unreachable=0    failed=1    skipped=7    rescued=0    ignored=0
hostname3 : ok=40   changed=0    unreachable=0    failed=1    skipped=7    rescued=0    ignored=0

[ec2-user@hostname1 ~]$
cat /zookeeper_datastore/logs/server.log
[2023-09-22 00:16:55,026] INFO Reading configuration from: /etc/amq_streams_zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2023-09-22 00:16:55,042] INFO clientPortAddress is 0.0.0.0:2181 (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2023-09-22 00:16:55,042] INFO secureClientPort is not set (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2023-09-22 00:16:55,042] INFO observerMasterPort is not set (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2023-09-22 00:16:55,042] INFO metricsProvider.className is org.apache.zookeeper.metrics.impl.DefaultMetricsProvider (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2023-09-22 00:16:55,070] INFO autopurge.snapRetainCount set to 3 (org.apache.zookeeper.server.DatadirCleanupManager)
[2023-09-22 00:16:55,071] INFO autopurge.purgeInterval set to 0 (org.apache.zookeeper.server.DatadirCleanupManager)
[2023-09-22 00:16:55,071] INFO Purge task is not scheduled. (org.apache.zookeeper.server.DatadirCleanupManager)
[2023-09-22 00:16:55,072] INFO Log4j 1.2 jmx support not found; jmx disabled. (org.apache.zookeeper.jmx.ManagedUtil)
[2023-09-22 00:16:55,072] INFO Starting quorum peer, myid=2 (org.apache.zookeeper.server.quorum.QuorumPeerMain)
[2023-09-22 00:16:55,117] INFO ServerMetrics initialized with provider org.apache.zookeeper.metrics.impl.DefaultMetricsProvider@12e61fe6 (org.apache.zookeeper.server.ServerMetrics)
[2023-09-22 00:16:55,154] INFO Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory (org.apache.zookeeper.server.ServerCnxnFactory)
[2023-09-22 00:16:55,169] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
[2023-09-22 00:16:55,186] INFO Server successfully logged in. (org.apache.zookeeper.Login)
[2023-09-22 00:16:55,200] WARN maxCnxns is not configured, using default value 0. (org.apache.zookeeper.server.ServerCnxnFactory)
[2023-09-22 00:16:55,203] INFO Configuring NIO connection handler with 10s sessionless connection timeout, 1 selector thread(s), 4 worker threads, and 64 kB direct buffers. (org.apache.zookeeper.server.NIOServerCnxnFactory)
rpelisse commented 11 months ago

Thanks for the report!

Depending on the infrastructure and how fast the Ansible controller is, there is indeed cases where the validation will ran before the service has fully started. You can add wait_for: to check that Zookeeper is indeed up (or simply delay the validation step):

roles:
  - role: amq_stream_zookeeper
 ...
post_tasks:
  - name: "Wait for Zookeeper to be up"
    ansible.builtin.wait_for:
      host: <zk_host>
      port: <zk_port>
  - ansible.builtin.include_role:
      name: amq_streams_zookeeper
      tasks_from: validation.yml

Otherwise, I've added such a wait in the amq_streams_broker role, that can be can be activate (see https://github.com/ansible-middleware/amq_streams/pull/94). Maybe you can try it in your environment, see if it helps?

rpelisse commented 10 months ago

@gbaufake can we close this issue?

gbaufake commented 10 months ago

I'm still facing this issue.

rpelisse commented 10 months ago

@gbaufake Oh, sorry to hear that! I assumed, wrongly, that adding the timeout/delay had worked.

Can you share your playbook and some details on the target system? We need to be able to reproduce the issue in order to investigate.

Thanks!