canonical / cos-proxy-operator

https://charmhub.io/cos-proxy
Apache License 2.0
2 stars 12 forks source link

cos-proxy fails on hook "filebeat-relation-changed" #85

Closed jeffreychang911 closed 9 months ago

jeffreychang911 commented 1 year ago

Bug Description

In SQA testrun 5703ae6a-aea7-4a77-947b-aa86d70b55b9, cos-proxy fails to install in hook "monitors-relation-changed".

To Reproduce

To reproduce, deploy cos and then the openstack bundle, which includes cos-proxy. This issue is not necessarily reproducible, we have seen this bundle deploy without issues before.

Environment

The environment is a juju maas controller hosting a charmed openstack deployment. This deployment is connected to cos, which is hosted on a microk8s, hosted on the same juju maas controller.

Relevant log output

2023-10-04 08:33:49 DEBUG juju.worker.uniter.operation executor.go:132 preparing operation "run relation-changed (167; unit: filebeat/53) hook" for cos-proxy/0
2023-10-04 08:33:49 DEBUG juju.worker.uniter.operation executor.go:132 executing operation "run relation-changed (167; unit: filebeat/53) hook" for cos-proxy/0
2023-10-04 08:33:49 DEBUG juju.worker.uniter agent.go:22 [AGENT-STATUS] executing: running filebeat-relation-changed hook for filebeat/53
2023-10-04 08:33:49 DEBUG juju.worker.uniter.runner runner.go:728 starting jujuc server  {unix @/var/lib/juju/agents/unit-cos-proxy-0/agent.socket <nil>}
2023-10-04 08:33:49 DEBUG unit.cos-proxy/0.juju-log server.go:316 filebeat:167: Operator Framework 2.1.1+2.geb8e25a up and running.
2023-10-04 08:33:50 DEBUG unit.cos-proxy/0.juju-log server.go:316 filebeat:167: Emitting Juju event filebeat_relation_changed.
2023-10-04 08:33:50 DEBUG unit.cos-proxy/0.juju-log server.go:316 filebeat:167: Emitting custom event <VectorConfigChangedEvent via COSProxyCharm/VectorProvider[filebeat_downstream-logging]/on/config_changed[3235]>.
2023-10-04 08:33:50 ERROR unit.cos-proxy/0.juju-log server.go:316 filebeat:167: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/usr/lib/python3.8/urllib/request.py", line 1354, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib/python3.8/http/client.py", line 1256, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1302, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1011, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 951, in send
    self.connect()
  File "/usr/lib/python3.8/http/client.py", line 922, in connect
    self.sock = self._create_connection(
  File "/usr/lib/python3.8/socket.py", line 808, in create_connection
    raise err
  File "/usr/lib/python3.8/socket.py", line 796, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./src/charm.py", line 475, in <module>
    main(COSProxyCharm)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/main.py", line 436, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/main.py", line 144, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/framework.py", line 354, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/framework.py", line 830, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/framework.py", line 919, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/lib/charms/vector/v0/vector.py", line 225, in _on_log_relation_changed
    self.on.config_changed.emit(config=self.config)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/framework.py", line 354, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/framework.py", line 830, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/framework.py", line 919, in _reemit
    custom_handler(event)
  File "./src/charm.py", line 359, in _write_vector_config
    r = request.urlopen(dest)
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 1383, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/lib/python3.8/urllib/request.py", line 1357, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 111] Connection refused>
2023-10-04 08:33:51 ERROR juju.worker.uniter.operation runhook.go:153 hook "filebeat-relation-changed" (via hook dispatching script: dispatch) failed: exit status 1

Additional context

No response

sed-i commented 1 year ago

Seems like a loki connection fails. At the very least, charm code should catch urllib.error.URLError: <urlopen error [Errno 111] Connection refused>.

https://github.com/canonical/cos-proxy-operator/blob/9bb0f2f1fbd48e6b9fcc8c6e3956ace6f205bd1c/src/charm.py#L375

motjuste commented 1 year ago

A similar error occurs in this run: https://solutions.qa.canonical.com/testruns/3d640f4a-a151-4020-ae60-2777e1fe7a78/, but the error log shows a different cause:

2023-10-16 10:15:05 ERROR unit.cos-proxy/0.juju-log server.go:325 filebeat:173: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/usr/lib/python3.10/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib/python3.10/http/client.py", line 1283, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1329, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/lib/python3.10/http/client.py", line 976, in send
    self.connect()
  File "/usr/lib/python3.10/http/client.py", line 942, in connect
    self.sock = self._create_connection(
  File "/usr/lib/python3.10/socket.py", line 845, in create_connection
    raise err
  File "/usr/lib/python3.10/socket.py", line 833, in create_connection
    sock.connect(sa)
OSError: [Errno 113] No route to host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/./src/charm.py", line 475, in <module>
    main(COSProxyCharm)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/main.py", line 436, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/main.py", line 144, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/framework.py", line 354, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/framework.py", line 830, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/framework.py", line 919, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/lib/charms/vector/v0/vector.py", line 225, in _on_log_relation_changed
    self.on.config_changed.emit(config=self.config)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/framework.py", line 354, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/framework.py", line 830, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/venv/ops/framework.py", line 919, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-cos-proxy-0/charm/./src/charm.py", line 359, in _write_vector_config
    r = request.urlopen(dest)
  File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.10/urllib/request.py", line 519, in open
    response = self._open(req, data)
  File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 1377, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/lib/python3.10/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 113] No route to host>
2023-10-16 10:15:05 ERROR juju.worker.uniter.operation runhook.go:180 hook "filebeat-relation-changed" (via hook dispatching script: dispatch) failed: exit status 1
2023-10-16 10:15:05 DEBUG juju.machinelock machinelock.go:206 created rotating log file "/var/log/juju/machine-lock.log" with max size 10 MB and max backups 5
2023-10-16 10:15:05 DEBUG juju.machinelock machinelock.go:190 machine lock "machine-lock" released for cos-proxy/0 uniter (run relation-changed (173; unit: filebeat/0) hook)
2023-10-16 10:15:05 DEBUG juju.worker.uniter.operation executor.go:118 lock released for cos-proxy/0
2023-10-16 10:15:05 INFO juju.worker.uniter resolver.go:161 awaiting error resolution for "relation-changed" hook
2023-10-16 10:15:05 DEBUG juju.worker.uniter agent.go:22 [AGENT-STATUS] error: hook failed: "filebeat-relation-changed"
2023-10-16 10:16:08 DEBUG juju.worker.uniter.remotestate watcher.go:808 update status timer triggered for cos-proxy/0
2023-10-16 10:16:08 INFO juju.worker.uniter resolver.go:161 awaiting error resolution for "relation-changed" hook
dstathis commented 9 months ago

Looks like this was fixed in this PR by Samuel Allan.

I have a couple issues with that change though. Firstly, it uses an already defined variable to catch the exception. I'll submit a change to fix that.

Secondly, the solution puts us in to BlockedStatus when we hit this which is an even worse situation because juju will not retry the hook.