canonical / traefik-k8s-operator

This charmed operator automates the operational procedures of running Traefik, an open-source application proxy.
https://charmhub.io/traefik-k8s
Apache License 2.0
11 stars 23 forks source link

Uncaught exception `ops.pebble.ConnectionError` #248

Open Abuelodelanada opened 10 months ago

Abuelodelanada commented 10 months ago

Bug Description

While deploying cos-lite (edge) using the TLS overlay I get the following error.

To Reproduce

  1. juju deploy cos-lite --channel=edge --trust --overlay ./tls-overlay.yaml
  2. check juju debug-log

Environment

Model  Controller  Cloud/Region        Version  SLA          Timestamp
cos    microk8s    microk8s/localhost  3.1.5    unsupported  17:53:53-03:00

App           Version  Status  Scale  Charm                     Channel  Rev  Address         Exposed  Message
alertmanager  0.25.0   active      1  alertmanager-k8s          edge      88  10.152.183.223  no       
ca                     active      1  self-signed-certificates  edge      32  10.152.183.227  no       
catalogue              active      1  catalogue-k8s             edge      25  10.152.183.160  no       
external-ca            active      1  self-signed-certificates  edge      32  10.152.183.70   no       
grafana       9.2.1    active      1  grafana-k8s               edge      92  10.152.183.120  no       
loki          2.7.4    active      1  loki-k8s                  edge      97  10.152.183.162  no       
prometheus    2.46.0   active      1  prometheus-k8s            edge     148  10.152.183.218  no       
traefik       2.10.4   active      1  traefik-k8s               edge     151  192.168.1.250   no       

Relevant log output

unit-traefik-0: 17:11:12.889 ERROR unit.traefik/0.juju-log receive-ca-cert:28: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/usr/lib/python3.8/urllib/request.py", line 1354, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib/python3.8/http/client.py", line 1256, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1302, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1011, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 951, in send
    self.connect()
  File "/var/lib/juju/agents/unit-traefik-0/charm/venv/ops/pebble.py", line 272, in connect
    self.sock.connect(self.socket_path)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-traefik-0/charm/venv/ops/pebble.py", line 1599, in _request_raw
    response = self.opener.open(request, timeout=self.timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/var/lib/juju/agents/unit-traefik-0/charm/venv/ops/pebble.py", line 286, in http_open
    return self.do_open(_UnixSocketConnection, req,  # type:ignore
  File "/usr/lib/python3.8/urllib/request.py", line 1357, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 2] No such file or directory>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./src/charm.py", line 1322, in <module>
    main(TraefikIngressCharm, use_juju_for_storage=True)
  File "/var/lib/juju/agents/unit-traefik-0/charm/venv/ops/main.py", line 441, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-traefik-0/charm/venv/ops/main.py", line 149, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-traefik-0/charm/venv/ops/framework.py", line 344, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-traefik-0/charm/venv/ops/framework.py", line 841, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-traefik-0/charm/venv/ops/framework.py", line 930, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-traefik-0/charm/lib/charms/certificate_transfer_interface/v0/certificate_transfer.py", line 374, in _on_relation_changed
    self.on.certificate_available.emit(
  File "/var/lib/juju/agents/unit-traefik-0/charm/venv/ops/framework.py", line 344, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-traefik-0/charm/venv/ops/framework.py", line 841, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-traefik-0/charm/venv/ops/framework.py", line 930, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-traefik-0/charm/lib/charms/tempo_k8s/v0/charm_tracing.py", line 455, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "./src/charm.py", line 255, in _on_recv_ca_cert_available
    self._update_received_ca_certs(event)
  File "/var/lib/juju/agents/unit-traefik-0/charm/lib/charms/tempo_k8s/v0/charm_tracing.py", line 455, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "./src/charm.py", line 266, in _update_received_ca_certs
    self.container.push(
  File "/var/lib/juju/agents/unit-traefik-0/charm/venv/ops/model.py", line 2142, in push
    self._pebble.push(str(path), source, encoding=encoding,
  File "/var/lib/juju/agents/unit-traefik-0/charm/venv/ops/pebble.py", line 2014, in push
    response = self._request_raw('POST', '/v1/files', None, headers, data)
  File "/var/lib/juju/agents/unit-traefik-0/charm/venv/ops/pebble.py", line 1612, in _request_raw
    raise ConnectionError(e.reason)
ops.pebble.ConnectionError: [Errno 2] No such file or directory
unit-traefik-0: 17:11:13.145 ERROR juju.worker.uniter.operation hook "receive-ca-cert-relation-changed" (via hook dispatching script: dispatch) failed: exit status 1

Additional context

No response

jeffreychang911 commented 3 months ago

Hi, SolQA noticed this issue several times in traefik charm rev 174 since last week. This might be a blocker for most of SQA deployment. One of the run here - https://solutions.qa.canonical.com/testruns/ac962bb7-80e7-4891-bb70-8ff2a63bf7e6 We deploy COS on top of microk8s 1.28 with tls, and is with juju 3.3.

sed-i commented 3 months ago

At first glance, this looks like a pebble error after a can_connect guard, which we decided should lead to error status, because juju would retry and resolve.

If that is indeed the case, then the right thing to do would probably be to let juju resolve. https://discourse.charmhub.io/t/its-probably-ok-for-a-unit-to-go-into-error-state/13022

asbalderson commented 3 months ago

As discussed a bit in the our sync earlier this week. We have enabled retries in our tests in the hope that it would resolve.

in some cases it may be better but the sample size is still small. I am still seeing issues where it stays blocked until we time out after 3 hours. Here is the status log on one where its been flapping back and forth for 30 min:

17 Apr 2024 17:24:11Z  workload   active       
17 Apr 2024 17:24:12Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:24:14Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:24:20Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:24:21Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:24:31Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:24:33Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:24:52Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:24:54Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:25:34Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:25:35Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:26:58Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:27:00Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:28:52Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:28:53Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:28:59Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:29:00Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:29:08Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:29:10Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:29:15Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:29:16Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:29:26Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:29:27Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:29:47Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:29:49Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:30:29Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:30:30Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:31:50Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:31:51Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:34:34Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:34:35Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:39:35Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:39:36Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:44:36Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:44:37Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:49:38Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:49:39Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:54:39Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:54:40Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 17:59:40Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 17:59:41Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 18:04:42Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 18:04:43Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"
17 Apr 2024 18:09:43Z  juju-unit  executing    running receive-ca-cert-relation-changed hook for ca/0
17 Apr 2024 18:09:45Z  juju-unit  error        hook failed: "receive-ca-cert-relation-changed"

I can follow up with logs from this test after it times out, or logs from another test where we've had similar looping as they become available.

amc94 commented 3 months ago

logs from traefik traefik.log