canonical / hardware-observer-operator

A charm to setup prometheus exporter for IPMI, RedFish and RAID devices from different vendors.
Apache License 2.0
7 stars 15 forks source link

feat: Manage redfish client timeout with collect-timeout charm config #220

Closed dashmage closed 5 months ago

dashmage commented 5 months ago

This change allows the user to manage the redfish client timeout using the existing collect-timeout charm config. When hardware exporter is installed, this configured value will be used for redfish (also refer hardware-exporter config.py for config definition and main.py where config is loaded from file).

Fixes #170

dashmage commented 5 months ago

Unit test changes are in progress. Unit test changes are complete. Also need to add new hw dependent func test for the redfish config. But I'm currently facing some trouble in procuring a machine that has redfish present on it.

dashmage commented 5 months ago

I prefer to provide less configuration to the user based on charm's design philosophy. Collect-timeout could impact all timeout value, including shell command , redfish, even the SMART metrics exporter in the future.

This is the reason I think #193 should solve #170

I see, so you suggest using the same config option (collect-timeout) itself for all metric collection operations instead of adding a separate one for redfish. The only con I see with this approach is that there might be a situation where we just want the timeout for redfish to be very high (due to very slow redfish response on some host) but we still want the timeout for the other collectors to be normal. Not sure if that's an issue we can ignore for a simpler setup with a single config option.

Pjack commented 5 months ago

I prefer to provide less configuration to the user based on charm's design philosophy. Collect-timeout could impact all timeout value, including shell command , redfish, even the SMART metrics exporter in the future. This is the reason I think #193 should solve #170

I see, so you suggest using the same config option (collect-timeout) itself for all metric collection operations instead of adding a separate one for redfish. The only con I see with this approach is that there might be a situation where we just want the timeout for redfish to be very high (due to very slow redfish response on some host) but we still want the timeout for the other collectors to be normal. Not sure if that's an issue we can ignore for a simpler setup with a single config option.

My two cents. We've implemented collect_timeout for all shell commands, including ipmi, megacli, and others. The issues you mentioned are indeed valid now. Therefore, I don't think it's a problem to include redfish.

dashmage commented 5 months ago

I've modified the PR title and description in order to reflect the new changes.