canonical / grafana-agent-k8s-operator

This charmed operator automates the operational procedures of running Grafana Agent, an open-soruce telemetry collector.
https://charmhub.io/grafana-agent-k8s
Apache License 2.0
8 stars 18 forks source link

Grafana-agent units go into error state when upgrading from r18 to r68 #284

Closed anna-savchenko closed 3 months ago

anna-savchenko commented 5 months ago

Bug Description

Grafana-agent units go into error state when upgrading from r18 to r68

To Reproduce

$ juju download grafana-agent --channel latest/stable --series jammy --filepath grafana-agent_r68.charm (on the node with Internet) $ juju refresh grafana-agent --path "grafana-agent_r68.charm" (on air-gapped node)

Environment

grafana-agent r68 juju machine controller 3.1.6

Relevant log output

2024-04-03 10:23:26 INFO unit.grafana-agent-host/36.juju-log server.go:325 Running legacy hooks/upgrade-charm.
2024-04-03 10:23:27 ERROR unit.grafana-agent-host/36.juju-log server.go:325 Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-grafana-agent-host-36/charm/./src/charm.py", line 534, in <module>
    main(GrafanaAgentMachineCharm)
  File "/var/lib/juju/agents/unit-grafana-agent-host-36/charm/venv/ops/main.py", line 456, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-grafana-agent-host-36/charm/venv/ops/main.py", line 144, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-grafana-agent-host-36/charm/venv/ops/framework.py", line 352, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-grafana-agent-host-36/charm/venv/ops/framework.py", line 865, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-grafana-agent-host-36/charm/venv/ops/framework.py", line 955, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-grafana-agent-host-36/charm/src/grafana_agent.py", line 213, in _on_upgrade_charm
    self._update_metrics_alerts()
  File "/var/lib/juju/agents/unit-grafana-agent-host-36/charm/src/grafana_agent.py", line 345, in _update_metrics_alerts
    self.update_alerts_rules(
  File "/var/lib/juju/agents/unit-grafana-agent-host-36/charm/src/grafana_agent.py", line 384, in update_alerts_rules
    rules = self._recurse_call_chain(alerts_func)
  File "/var/lib/juju/agents/unit-grafana-agent-host-36/charm/src/grafana_agent.py", line 370, in _recurse_call_chain
    return self._recurse_call_chain(maybe_func())
  File "/var/lib/juju/agents/unit-grafana-agent-host-36/charm/./src/charm.py", line 258, in metrics_rules
    rules = self._cos.metrics_alerts
  File "/var/lib/juju/agents/unit-grafana-agent-host-36/charm/lib/charms/grafana_agent/v0/cos_agent.py", line 659, in metrics_alerts
    for data in self._gather_peer_data():
  File "/var/lib/juju/agents/unit-grafana-agent-host-36/charm/lib/charms/grafana_agent/v0/cos_agent.py", line 644, in _gather_peer_data
    data = CosAgentPeersUnitData(**json.loads(raw))
  File "/var/lib/juju/agents/unit-grafana-agent-host-36/charm/venv/pydantic/main.py", line 341, in __init__
    raise validation_error
pydantic.error_wrappers.ValidationError: 3 validation errors for CosAgentPeersUnitData
unit_name
  field required (type=value_error.missing)
relation_id
  field required (type=value_error.missing)
relation_name
  field required (type=value_error.missing)
2024-04-03 10:23:27 ERROR juju.worker.uniter.operation runhook.go:180 hook "upgrade-charm" (via hook dispatching script: dispatch) failed: exit status 1
2024-04-03 10:23:27 INFO juju.worker.uniter resolver.go:161 awaiting error resolution for "upgrade-charm" hook

Additional context

No response

PietroPasotti commented 3 months ago

Hi! Thanks for submitting the issue :) Unfortunately we don't have an upgrade path between revisions that are so far apart.

However grafana-agent does not store data, so it's safe to redeploy without losing anything except what's in flight (if that's a concern, you could deploy a second one alongside with it and only then remove the first one).

Abuelodelanada commented 3 months ago

Hi @anna-savchenko

This is de repo for grafana-agent for K8s, and the issue is for grafana-agent for machines.

Grafana Agent for machine was under heavy development in the last 6 months, that's why our stable version is now revision 95 :-O

image

As @PietroPasotti mentioned we don't have an upgrade path between revisions that are so far apart... grafana-agent does not store data, so it's safe to redeploy without losing anything except what's in flight.

I'm closing this issue, please fell free to re-open it if you need support!

anna-savchenko commented 3 months ago

Hi @PietroPasotti @Abuelodelanada, thanks for looking into this issue.

Ack regarding redeployment. Indeed, it worked :)