canonical / kafka-operator

Kafka VM operator
Apache License 2.0
6 stars 12 forks source link

Kafka in error state #214

Open dstathis opened 1 month ago

dstathis commented 1 month ago

Steps to reproduce

  1. Deploy Kafka
  2. Relate to grafana-agent, hardware-observer, and zookeeper

Expected behavior

Not error state

Actual behavior

Kafka in error state

Model    Controller  Cloud/Region         Version  SLA          Timestamp
machine  machine     localhost/localhost  3.5.1    unsupported  20:03:26Z

SAAS        Status  Store  URL
grafana     active  k8s    admin/lma.grafana-dashboards
loki        active  k8s    admin/lma.loki-logging
prometheus  active  k8s    admin/lma.prometheus-receive-remote-write

App        Version  Status  Scale  Charm              Channel      Rev  Exposed  Message
agent               active      4  grafana-agent      latest/edge  216  no       
hob                 active      4  hardware-observer  latest/edge   81  no       Unit is ready
kafka      3.6.1    error       3  kafka              3/edge       173  no       hook failed: "start"
zookeeper           active      1  zookeeper          3/edge       134  no     

Unit          Workload  Agent      Machine  Public address                          Ports  Message
kafka/0*      error     idle       1        10.33.218.51                                   hook failed: "start"
  agent/1*    active    executing           10.33.218.51                               
  hob/1*      active    idle                10.33.218.51                                   Unit is ready
kafka/1       error     idle       2        10.33.218.147                                  hook failed: "zookeeper-relation-changed" for zook
eeper:zookeeper
  agent/2     active    idle                10.33.218.147                              
  hob/2       active    idle                10.33.218.147                                  Unit is ready
kafka/2       error     idle       3        fd42:fcd8:e66f:7680:216:3eff:fe5a:594e         hook failed: "zookeeper-relation-changed" for zook
eeper:zookeeper
  agent/3     active    idle                fd42:fcd8:e66f:7680:216:3eff:fe5a:594e     
  hob/3       active    idle                fd42:fcd8:e66f:7680:216:3eff:fe5a:594e         Unit is ready
zookeeper/0*  blocked   executing  0        10.33.218.189                                  zookeeper service is unreachable or not serving re
quests
  agent/0     active    idle                10.33.218.189                              
  hob/0       active    idle                10.33.218.189                                  Unit is ready

Machine  State    Address                                 Inst id        Base          AZ  Message
0        started  10.33.218.189                           juju-973497-0  ubuntu@22.04      Running
1        started  10.33.218.51                            juju-973497-1  ubuntu@22.04      Running
2        started  10.33.218.147                           juju-973497-2  ubuntu@22.04      Running
3        started  fd42:fcd8:e66f:7680:216:3eff:fe5a:594e  juju-973497-3  ubuntu@22.04      Running

Integration provider               Requirer                   Interface                Type         Message
agent:grafana-dashboards-provider  grafana:grafana-dashboard  grafana_dashboard        regular
agent:peers                        agent:peers                grafana_agent_replica    peer
hob:cos-agent                      agent:cos-agent            cos_agent                subordinate
kafka:cluster                      kafka:cluster              cluster                  peer
kafka:cos-agent                    agent:cos-agent            cos_agent                subordinate
kafka:juju-info                    hob:general-info           juju-info                subordinate
kafka:restart                      kafka:restart              rolling_op               peer
kafka:upgrade                      kafka:upgrade              upgrade                  peer
loki:logging                       agent:logging-consumer     loki_push_api            regular
prometheus:receive-remote-write    agent:send-remote-write    prometheus_remote_write  regular
zookeeper:cluster                  zookeeper:cluster          cluster                  peer
zookeeper:cos-agent                agent:cos-agent            cos_agent                subordinate
zookeeper:juju-info                hob:general-info           juju-info                subordinate
zookeeper:restart                  zookeeper:restart          rolling_op               peer
zookeeper:upgrade                  zookeeper:upgrade          upgrade                  peer
zookeeper:zookeeper                kafka:zookeeper            zookeeper                regular

Versions

                           juju info v0.1                           
┌──────────────┬───────────────────────────────────────────────────┐
│ jhack        │ 0.4.2.3                                           │
│ python       │ 3.12.3 (/home/dylan/repos/jhack/venv/bin/python3) │
│ juju-* snaps │  juju │ 3.5.2 - 27745 (3/stable)                  │
│ microk8s     │ MicroK8s v1.29.4 revision 6809                    │
│ lxd          │ 5.21.1 LTS                                        │
│ multipass    │ Not Installed.                                    │
│ multipassd   │ Not Installed.                                    │
│ os           │ Ubuntu 24.04 LTS                                  │
│ kernel       │ Linux 6.8.0-35-generic x86_64                     │
└──────────────┴───────────────────────────────────────────────────┘

Log output

Juju debug log:

unit-kafka-0: 20:02:00 ERROR unit.kafka/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kafka-0/charm/lib/charms/zookeeper/v0/client.py", line 129, in __init__
    self.leader = self.get_leader()
  File "/var/lib/juju/agents/unit-kafka-0/charm/venv/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/var/lib/juju/agents/unit-kafka-0/charm/venv/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/var/lib/juju/agents/unit-kafka-0/charm/venv/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x748b11c6bbb0 state=finished returned str>]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kafka-0/charm/./src/charm.py", line 384, in <module>
    main(KafkaCharm)
  File "/var/lib/juju/agents/unit-kafka-0/charm/venv/ops/main.py", line 544, in main
    manager.run()
  File "/var/lib/juju/agents/unit-kafka-0/charm/venv/ops/main.py", line 520, in run
    self._emit()
  File "/var/lib/juju/agents/unit-kafka-0/charm/venv/ops/main.py", line 509, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-kafka-0/charm/venv/ops/main.py", line 143, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kafka-0/charm/venv/ops/framework.py", line 350, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-kafka-0/charm/venv/ops/framework.py", line 849, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-kafka-0/charm/venv/ops/framework.py", line 939, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-kafka-0/charm/./src/charm.py", line 156, in _on_start
    self._on_update_status(event)
  File "/var/lib/juju/agents/unit-kafka-0/charm/./src/charm.py", line 229, in _on_update_status
    if not self.state.zookeeper.broker_active():
  File "/var/lib/juju/agents/unit-kafka-0/charm/venv/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/var/lib/juju/agents/unit-kafka-0/charm/venv/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/var/lib/juju/agents/unit-kafka-0/charm/venv/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/var/lib/juju/agents/unit-kafka-0/charm/venv/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kafka-0/charm/src/core/models.py", line 374, in broker_active
    zk = ZooKeeperManager(hosts=hosts, username=self.username, password=self.password)
  File "/var/lib/juju/agents/unit-kafka-0/charm/lib/charms/zookeeper/v0/client.py", line 131, in __init__
    raise QuorumLeaderNotFoundError("quorum leader not found")
charms.zookeeper.v0.client.QuorumLeaderNotFoundError: quorum leader not found
unit-kafka-0: 20:02:00 ERROR juju.worker.uniter.operation hook "start" (via hook dispatching script: dispatch) failed: exit status 1

Additional context

github-actions[bot] commented 1 month ago

https://warthogs.atlassian.net/browse/DPE-4888

marcoppenheimer commented 1 month ago

I believe it's due to the IPv6 addresses. I'm not sure which cloud you're on, but right now the charm only support IPv4. Could you try disabling IPv6 and see if that resolves the issue?