Closed sudeephb closed 3 months ago
After installing hardware-observer, the whitelist will be generated once during installation. Subsequently, the list of collectors should not be changed dynamically.
I think the code in get_redfish_conn_params is incorrect. We should not use bmc_hw_verifier
to get the list again. The list may be different because of some temporary network connectivity issue.
After installing hardware-observer, the whitelist will be generated once during installation. Subsequently, the list of collectors should not be changed dynamically.
I think the code in get_redfish_conn_params is incorrect. We should not use
bmc_hw_verifier
to get the list again. The list may be different because of some temporary network connectivity issue.
If we use the whitelist generated during installation, anything that was missed during installation(because of temporary network issues, for example), will never be added, even though they exist.
After installing hardware-observer, the whitelist will be generated once during installation. Subsequently, the list of collectors should not be changed dynamically. I think the code in get_redfish_conn_params is incorrect. We should not use
bmc_hw_verifier
to get the list again. The list may be different because of some temporary network connectivity issue.If we use the whitelist generated during installation, anything that was missed during installation(because of temporary network issues, for example), will never be added, even though they exist.
That's true and it's the design in hardware-observer. We have another feature to re-generate the whiltelist as a juju-action. #96
If the whitelist is changed dynamically, the corresponding metrics/alert rules will be affected, potentially resulting in the loss of monitoring for broken hardware and failure to trigger alerts.
Therefore, the only source of truth: the whitelist generated during installation.
It's a trade-off in charm design.
We can choose either
One question for #202 is how often this happen and user want auto-recover or manually refresh.
Conclusion:
Recreate the SDR cache if it's out of date, and hence the
ipmi-sel
andipmi-sensor
collectors won't be disabled, just because SDR cache was out of date.