canonical / hardware-observer-operator

A charm to setup prometheus exporter for IPMI, RedFish and RAID devices from different vendors.
Apache License 2.0
7 stars 14 forks source link

storcli not installed after `juju attach-resource` on 2 machines out of 15 #205

Closed przemeklal closed 2 months ago

przemeklal commented 4 months ago

After attaching the storcli-deb resource:

juju attach-resource hardware-observer storcli-deb=./Unified_storcli_all_os/Ubuntu/storcli_007.2612.0000.0000_all.deb

All hw-obs units are active/idle but on two of them the storcli collector reports failures with the following errors in the hardware-exporter logs:

Mar 26 13:16:16 redacted python3[687648]: 2024-03-26 13:16:16 ERROR Command 'which storcli' returned non-zero exit status 1.
Mar 26 13:16:16 redacted python3[687648]: 2024-03-26 13:16:16 ERROR storcli not installed.
Mar 26 13:16:16 redacted python3[687648]: 2024-03-26 13:16:16 ERROR storcli not installed.
Mar 26 13:16:16 redacted python3[687648]: 2024-03-26 13:16:16 ERROR Failed to get MegaRAID controller information using storcli

I tried to attach-resource again but it didn't fix anything.

dashmage commented 2 months ago

After checking with @przemeklal , we don't have access to this particular cloud as of now so it would be difficult to obtain more logs. This looks like some sort of race condition where the resource deb package isn't being installed correctly on occasion.

dashmage commented 2 months ago

Since we can't reproduce this bug now and there's no clear indicator to why this is happening, I will close this issue. But in case there's some more insight to this issue, please feel free to re-open it.