canonical / hardware-observer-operator

A charm to setup prometheus exporter for IPMI, RedFish and RAID devices from different vendors.
Apache License 2.0
7 stars 14 forks source link

IPMI SDR Cache out of date disables IPMI collectors during update-status hook #202

Closed przemeklal closed 3 months ago

przemeklal commented 4 months ago

IPMI collectors worked on a machine before. At some point, they disappeared from the metrics list silently without triggering any alerts (ipmi_sel_command_success):

2024-03-25 14:36:43 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
2024-03-25 14:42:20 WARNING unit.hardware-observer/7.update-status logger.go:60 SDR Cache '/root/.freeipmi/sdr-cache/sdr-cache-redacted.localhost' out of date: Please flush the cache and regenerate it
2024-03-25 14:42:20 INFO unit.hardware-observer/7.juju-log server.go:316 IPMI sensors monitoring is not available
2024-03-25 14:42:20 WARNING unit.hardware-observer/7.update-status logger.go:60 SDR Cache '/root/.freeipmi/sdr-cache/sdr-cache-redacted.localhost' out of date: Please flush the cache and regenerate it
2024-03-25 14:42:20 INFO unit.hardware-observer/7.juju-log server.go:316 IPMI SEL monitoring is not available
2024-03-25 14:42:20 INFO unit.hardware-observer/7.juju-log server.go:316 Attempt 1 of /redfish/v1/
2024-03-25 14:42:22 INFO unit.hardware-observer/7.juju-log server.go:316 Response Time for GET to /redfish/v1/: 1.5902138333767653 seconds.
2024-03-25 14:42:22 INFO unit.hardware-observer/7.juju-log server.go:316 Attempt 1 of /redfish/v1/SessionService/Sessions/
2024-03-25 14:42:22 INFO unit.hardware-observer/7.juju-log server.go:316 Response Time for POST to /redfish/v1/SessionService/Sessions/: 0.1784691703505814 seconds.
2024-03-25 14:42:22 INFO unit.hardware-observer/7.juju-log server.go:316 Login returned code 400: {"error":{"code":"iLO.0.10.ExtendedInfo","message":"See @Message.ExtendedInfo for more information.","@Message.ExtendedInfo":[{"MessageId":"iLO.2.14.UnauthorizedLoginAttempt"}]}}
2024-03-25 14:42:22 INFO juju.worker.uniter.operation runhook.go:159 ran "update-status" hook (via hook dispatching script: dispatch)
cat /etc/hardware-exporter-config.yaml 
port: 10200
level: INFO

enable_collectors:

  - collector.ipmi_dcmi

  - collector.redfish

redfish_host: "https://redacted"
redfish_username: "redacted"
redfish_password: "redacted"

The issues here are:

hardware-observer revision latest/stable 25

Pjack commented 3 months ago

This specific behavior is addressed by #213 . So we can close this issue.

214 and #96 will be addressed in the future