canonical / tempo-k8s-operator

This charmed operator automates the operational procedures of running Grafana Tempo, an open-source tracing backend.
https://charmhub.io/tempo-k8s
Apache License 2.0
5 stars 3 forks source link

tracing v2 implementation #65

Closed PietroPasotti closed 5 months ago

PietroPasotti commented 5 months ago

Tandem PR: https://github.com/canonical/charm-relation-interfaces/pull/136

mmkay commented 5 months ago

First of all I wanted to check how easy is it to switch to v2. I took prometheus as one of the charms that I recently updated tracing in. My branch is here: https://github.com/canonical/prometheus-k8s-operator/tree/tracing-v2 . Commands to deploy tempo and prometheus:

juju deploy --trust ./tempo-k8s_ubuntu-22.04-amd64.charm tempo --resource tempo-image=grafana/tempo:1.5.0
juju deploy --trust ./prometheus-k8s_ubuntu-20.04-amd64.charm --resource prometheus-image=ubuntu/prometheus:2-22.04 && juju integrate tempo prometheus-k8s:tracing

There might be a bit of a race condition, as although I initialized tracing using

self.tracing = TracingEndpointRequirer(self, protocols=["otlp_http"])

on the endpoint method:

@property  
def tempo(self) -> Optional[str]:  
    """Tempo endpoint for charm tracing."""  
    return self.tracing.get_endpoint("otlp_http")

I got this exception:

unit-prometheus-0: 17:06:42 WARNING unit.prometheus/0.juju-log <class '__main__.PrometheusCharm'>.<property object at 0x7f943cccd590> returned None; continuing with tracing DISABLED.
unit-prometheus-k8s-0: 17:07:09 INFO juju.worker.uniter awaiting error resolution for "install" hook
unit-prometheus-k8s-0: 17:07:10 INFO unit.prometheus-k8s/0.juju-log Running legacy hooks/install.
unit-prometheus-k8s-0: 17:07:10 ERROR unit.prometheus-k8s/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 1074, in <module>
    main(PrometheusCharm)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/ops/main.py", line 444, in main
    charm = charm_class(framework)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 319, in wrap_init
    tracing_endpoint = _get_tracing_endpoint(tracing_endpoint_getter, self, charm)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 241, in _get_tracing_endpoint
    tracing_endpoint = tracing_endpoint_getter.__get__(self)
  File "./src/charm.py", line 1065, in tempo
    return self.tracing.get_endpoint("otlp_http")
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/lib/charms/tempo_k8s/v2/tracing.py", line 744, in get_endpoint
    raise ProtocolNotRequestedError(protocol, relation)
charms.tempo_k8s.v2.tracing.ProtocolNotRequestedError: ('otlp_http', None)

After I added if self.tracing.is_ready(): I was still getting connection issues.

unit-prometheus-k8s-0: 17:36:03 ERROR unit.prometheus-k8s/0.juju-log Exception while exporting Span batch.
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/urllib3/connection.py", line 198, in _new_conn
    sock = connection.create_connection(
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/urllib3/connectionpool.py", line 496, in _make_request
    conn.request(
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/urllib3/connection.py", line 400, in request
    self.endheaders()
  File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1011, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 951, in send
    self.connect()
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/urllib3/connection.py", line 238, in connect
    self.sock = self._new_conn()
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/urllib3/connection.py", line 213, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fd2342b4640>: Failed to establish a new connection: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/urllib3/connectionpool.py", line 847, in urlopen
    retries = retries.increment(
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='tempo-0.tempo-endpoints.cos.svc.cluster.local', port=4318): Max retries exceeded with url: /v1/traces (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd2342b4640>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/opentelemetry/sdk/trace/export/__init__.py", line 368, in _export_batch
    self.span_exporter.export(self.spans_list[:idx])  # type: ignore
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 145, in export
    resp = self._export(serialized_data)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 114, in _export
    return self._session.post(
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/requests/sessions.py", line 637, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/requests/adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='tempo-0.tempo-endpoints.cos.svc.cluster.local', port=4318): Max retries exceeded with url: /v1/traces (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd2342b4640>: Failed to establish a new connection: [Errno 111] Connection refused'))

Relation data from jhack show-relation prometheus-k8s tempo:tracing:

│                                                                                                  │
│ ╭───────────────────────────────────────── locals ─────────────────────────────────────────╮     │
│ │  local_endpoint = 'tracing'                                                              │     │
│ │         matches = []                                                                     │     │
│ │             obj = 'tempo:tracing'                                                        │     │
│ │       other_obj = 'prometheus-k8s'                                                       │     │
│ │        relation = Relation(                                                              │     │
│ │                   │   provider='tempo',                                                  │     │
│ │                   │   provider_endpoint='tracing',                                       │     │
│ │                   │   requirer='prometheus-k8s',                                         │     │
│ │                   │   requirer_endpoint='tracing',                                       │     │
│ │                   │   interface='tracing',                                               │     │
│ │                   │   raw_type='regular'                                                 │     │
│ │                   )                                                                      │     │
│ │       relations = [                                                                      │     │
│ │                   │   {                                                                  │     │
│ │                   │   │   'relation-id': 1,                                              │     │
│ │                   │   │   'endpoint': 'tracing',                                         │     │
│ │                   │   │   'related-endpoint': 'tracing',                                 │     │
│ │                   │   │   'application-data': {                                          │     │
│ │                   │   │   │   'host': '"tempo-0.tempo-endpoints.cos.svc.cluster.local"', │     │
│ │                   │   │   │   'receivers': '[{"protocol": "otlp_http", "port": 4318}]'   │     │
│ │                   │   │   },                                                             │     │
│ │                   │   │   'local-unit': {'in-scope': False, 'data': None},               │     │
│ │                   │   │   'related-units': {                                             │     │
│ │                   │   │   │   'tempo/0': {                                               │     │
│ │                   │   │   │   │   'in-scope': True,                                      │     │
│ │                   │   │   │   │   'data': {                                              │     │
│ │                   │   │   │   │   │   'egress-subnets': '10.152.183.113/32',             │     │
│ │                   │   │   │   │   │   'ingress-address': '10.152.183.113',               │     │
│ │                   │   │   │   │   │   'private-address': '10.152.183.113'                │     │
│ │                   │   │   │   │   }                                                      │     │
│ │                   │   │   │   }                                                          │     │
│ │                   │   │   }                                                              │     │
│ │                   │   }                                                                  │     │
│ │                   ]                                                                      │     │
│ │ remote_endpoint = None                                                                   │     │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────╯     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

Maybe there's an issue with port opening? It doesn't look like 4318 is exposed by the service:

$ kubectl get services -A
NAMESPACE             NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
default               kubernetes                 ClusterIP   10.152.183.1     <none>        443/TCP                  31h
kube-system           kube-dns                   ClusterIP   10.152.183.10    <none>        53/UDP,53/TCP,9153/TCP   31h
metallb-system        webhook-service            ClusterIP   10.152.183.24    <none>        443/TCP                  31h
controller-microk8s   controller-service         ClusterIP   10.152.183.46    <none>        17070/TCP                30h
controller-microk8s   modeloperator              ClusterIP   10.152.183.209   <none>        17071/TCP                30h
pietro                modeloperator              ClusterIP   10.152.183.162   <none>        17071/TCP                6m34s
pietro                tempo                      ClusterIP   10.152.183.63    <none>        65535/TCP                6m12s
pietro                tempo-endpoints            ClusterIP   None             <none>        <none>                   6m11s
pietro                prometheus-k8s-endpoints   ClusterIP   None             <none>        <none>                   3m42s
pietro                prometheus-k8s             ClusterIP   10.152.183.115   <none>        9090/TCP                 3m43s

while the one from edge seems to expose a ton of ports:

pietro                tempo-k8s                  ClusterIP   10.152.183.99    <none>        3200/TCP,4317/TCP,4318/TCP,9411/TCP,14268/TCP,14250/TCP   40s