canonical / hardware-observer-operator

A charm to setup prometheus exporter for IPMI, RedFish and RAID devices from different vendors.
Apache License 2.0
7 stars 14 forks source link

Error when testing with grafana-agent multi-subordinate functionality. #43

Closed dashmage closed 1 year ago

dashmage commented 1 year ago

Issue

Error with hardware-observer charm when trying to test the multi-subordinate functionality of grafana-agent charm from this branch. The hardware-observer charm status is blocked with the message "Exporter is unhealthy".

Setup

Relations are added for the grafana-agent charm with both zookeeper and hardware-observer over the cos-agent interface.

juju deploy zookeeper
juju deploy hardware-observer --channel edge
# grafana-agent charm built from the "multi-sub" branch
juju deploy ./grafana-agent.charm

juju relate grafana-agent zookeeper
juju relate hardware-observer zookeeper
juju relate hardware-observer grafana-agent

# COS is setup on another k8s cloud
juju relate grafana-agent ashley/cos.prometheus

COS Setup The microk8s charm is added in the same model and configured with COS. The k8s cloud is added to the same controller using the juju add-k8s command.

juju deploy microk8s
juju config microk8s addons="dns ingress hostpath-storage metallb:<public-ip-of-machine>-<public-ip-of-machine>"

cat containerd_env
# ---
ulimit -n 65536 || true
ulimit -l 16384 || true

HTTP_PROXY=http://squid.internal:3128
HTTPS_PROXY=http://squid.internal:3128
NO_PROXY=127.0.0.1,localhost,::1,10.130.11.0/24,10.130.12.0/24,10.130.13.0/24,10.152.183.0/24,api.jujucharms.com,api.charmhub.io
https_proxy=http://squid.internal:3128
http_proxy=http://squid.internal:3128
no_proxy=127.0.0.1,localhost,::1,10.130.11.0/24,10.130.12.0/24,10.130.13.0/24,10.152.183.0/24,api.jujucharms.com,api.charmhub.io
# ---

juju config microk8s containerd_env=@containerd_env

juju ssh microk8s/leader -- microk8s config > ~/.kube/config

# add microk8s cloud
juju add-k8s micro -c ct-maas-ctrl

# Add new model for cos in the newly setup cloud
juju add-model cos micro

juju deploy cos-lite --channel edge --trust
juju offer prometheus:receive-remote-write
juju status --relations
(...)
zookeeper/3*             active    idle   4        10.1.11.46
  grafana-agent/14*      active    idle            10.1.11.46                                grafana-cloud-config: off, logging-consumer: off, grafana-dashboards-provider: off
  hardware-observer/24*  blocked   idle            10.1.11.46                                Exporter is unhealthy

Relation provider                Requirer                         Interface                Type         Message
grafana-agent:peers              grafana-agent:peers              grafana_agent_replica    peer         
hardware-observer:cos-agent      grafana-agent:cos-agent          cos_agent                subordinate  
microk8s:cluster                 microk8s:cluster                 microk8s-cluster         peer         
prometheus:receive-remote-write  grafana-agent:send-remote-write  prometheus_remote_write  regular      
zookeeper:cluster                zookeeper:cluster                cluster                  peer         
zookeeper:cos-agent              grafana-agent:cos-agent          cos_agent                subordinate  
zookeeper:juju-info              hardware-observer:general-info   juju-info                subordinate  
zookeeper:restart                zookeeper:restart                rolling_op               peer         

Error

unit-hardware-observer-24: 18:28:09 ERROR unit.hardware-observer/24.juju-log cos-agent:41: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/./src/charm.py", line 173, in <module>
    ops.main(HardwareObserverCharm)  # type: ignore
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/venv/ops/main.py", line 454, in __call__
    return main(charm_class, use_juju_for_storage=use_juju_for_storage)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/venv/ops/main.py", line 441, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/venv/ops/main.py", line 149, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/venv/ops/framework.py", line 344, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/venv/ops/framework.py", line 833, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/venv/ops/framework.py", line 922, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/./src/charm.py", line 148, in _on_cos_agent_relation_joined
    self.exporter.start()
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/src/service.py", line 36, in wrapper
    return_value = func(self, *args, **kwargs)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/src/service.py", line 154, in start
    systemd.service_start(EXPORTER_NAME)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/lib/charms/operator_libs_linux/v1/systemd.py", line 156, in service_start
    return _systemctl("start", service_name)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/lib/charms/operator_libs_linux/v1/systemd.py", line 125, in _systemctl
    raise SystemdError(
charms.operator_libs_linux.v1.systemd.SystemdError: Could not start hardware-exporter: systemd output: See "systemctl status hardware-exporter.service" and "journalctl -xeu hardware-exporter.service" for details.

unit-hardware-observer-24: 18:28:10 ERROR juju.worker.uniter.operation hook "cos-agent-relation-joined" (via hook dispatching script: dispatch) failed: exit status 1

Other Notes

dashmage commented 1 year ago

Here is the output from systemctl status and journalctl for hardware-exporter.service running on the machine.

I could find some pydantic errors in the journalctl output.

ubuntu@rozary:~$ journalctl -xeu hardware-exporter.service
Jul 27 10:50:13 rozary python3[2053576]: pydantic.error_wrappers.ValidationError: 2 validation errors for Config
Jul 27 10:50:13 rozary python3[2053576]: redfish_username
Jul 27 10:50:13 rozary python3[2053576]:   none is not an allowed value (type=type_error.none.not_allowed)
Jul 27 10:50:13 rozary python3[2053576]: redfish_password
Jul 27 10:50:13 rozary python3[2053576]:   none is not an allowed value (type=type_error.none.not_allowed)

(...)