canonical / grafana-agent-k8s-operator

This charmed operator automates the operational procedures of running Grafana Agent, an open-soruce telemetry collector.
https://charmhub.io/grafana-agent-k8s
Apache License 2.0
8 stars 18 forks source link

Airgap deployment not possible because of promtail #300

Closed phvalguima closed 3 months ago

phvalguima commented 3 months ago

Bug Description

Promtail download fails in airgapped setups with:

Traceback (most recent call last):
  File "/usr/lib/python3.10/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib/python3.10/http/client.py", line 1283, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1329, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/lib/python3.10/http/client.py", line 976, in send
    self.connect()
  File "/usr/lib/python3.10/http/client.py", line 1448, in connect
    super().connect()
  File "/usr/lib/python3.10/http/client.py", line 942, in connect
    self.sock = self._create_connection(
  File "/usr/lib/python3.10/socket.py", line 845, in create_connection
    raise err
  File "/usr/lib/python3.10/socket.py", line 833, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/./src/charm.py", line 1690, in <module>
    main(PostgresqlOperatorCharm, use_juju_for_storage=True)
  File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/ops/main.py", line 544, in main
    manager.run()
  File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/ops/main.py", line 520, in run
    self._emit()
  File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/ops/main.py", line 509, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/ops/main.py", line 143, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/ops/framework.py", line 352, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/ops/framework.py", line 851, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/ops/framework.py", line 941, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/lib/charms/loki_k8s/v0/loki_push_api.py", line 1855, in _on_relation_changed
    self._setup_promtail()
  File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/lib/charms/loki_k8s/v0/loki_push_api.py", line 2323, in _setup_promtail
    self._obtain_promtail(promtail_binaries[self._arch])
  File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/lib/charms/loki_k8s/v0/loki_push_api.py", line 1993, in _obtain_promtail
    self._download_and_push_promtail_to_workload(promtail_info)
  File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/lib/charms/loki_k8s/v0/loki_push_api.py", line 2137, in _download_and_push_promtail_to_workload
    with opener.open(promtail_info["url"]) as r:
  File "/usr/lib/python3.10/urllib/request.py", line 519, in open
    response = self._open(req, data)
  File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/usr/lib/python3.10/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 110] Connection timed out>

To Reproduce

  1. Deploy cos-lite
  2. Deploy grafana agent in another model
  3. Disable network in your setup (in my case, remove the default route from AWS Route Table)
  4. Relate grafana-agent and cos-lite

Environment

AWS, juju 3.4

Relevant log output

Already added to the bug description

Additional context

No response

PietroPasotti commented 3 months ago

The only solution we see at the moment is to add to the charm a juju resource that follows the naming of promtail and deploy the charm with it. We have to think about it some more to see if there's a less work-intensive solution.

phvalguima commented 3 months ago

If for k8s we must run promtail alongside each workload, then I suggest we make the resource_name argument mandatory. That enforces charm developers to look into the resource and specify it.

Sorry for "reusing" this bug but just a brief mention to VM charms: it seems that we can get away by adding the resource directly in the grafana-agent charm itself, as it is a subordinate charm.

Another alternative is to ship promtail embedded on snap/rocks by default. That can be tricky on snaps, depending on the levels of access it needs to the system. For k8s it would be a good solution. In this case, LogProxyConsumer class would seek the path defined as argument or fail right away if not found. That would be a good warning to the charm developer that it has missed something.

sed-i commented 3 months ago

Another option is to add a dump section for promtail in charmcraft.yaml. This way:

However, this would require:

sed-i commented 3 months ago

Another option is to add a server to grafana-agent to serve promtail (example 1, 2).

@benhoyt, is it possible in charm code to launch a pebble service on the charm container? E.g. python3 -m http.server 8000, to serve a file from the charm container.

sed-i commented 3 months ago

@taurus-forever, what's your take on this? Would it make sense for you to add promtail as a resource or as a dump section in charmcraft.yaml?

benhoyt commented 3 months ago

@benhoyt, is it possible in charm code to launch a pebble service on the charm container? E.g. python3 -m http.server 8000, to serve a file from the charm container.

@sed-i I believe the charm container now runs its own Pebble, so I suppose you could. I don't know too much about how it's configured, but you'd have to figure out how to talk to the charm container's Pebble to add a new layer with the service in it.

Why do you want to serve the promtail binary, though? Is it so _download_and_push_promtail_to_workload can access it via a URL? That seems kinda roundabout ... I think it'd be better to "just" include the binary/binaries in the charm, or as a charm resource.

sed-i commented 3 months ago

Thanks @benhoyt The idea is to serve the promtail binary in the loki pod and the grafana agent pod, because it's a dependency that other charms need, not us. Currently, the charm lib on the other side of the relation downloads promtail automatically. In air-gapped, this is tricky, and one workaround is serving it ourselves.

It seems too much to ask other charm authors to include "3rd party build artifacts" (promtail) in their flow.