clustervision / trinityX

TrinityX is the new generation of ClusterVision's open-source HPC, A/I and cloudbursting platform. It is designed from the ground up to provide all services required in a modern HPC and A/I system, and to allow full customization of the installation.
GNU General Public License v3.0
67 stars 37 forks source link

Question regarding IPMI #400

Closed xdkreij closed 10 months ago

xdkreij commented 10 months ago

So I've managed to let the controller.yml do it's thing, to eventually find out 3 services are in failed state;

Following up on that, I've found something else as well in the logs

telegraf[167654]: 2024-01-12T13:30:30Z E! [inputs.ipmi_sensor] Error in plugin: failed to run command "sudo -n /usr/bin/ipmitool sdr": exit status 1 - Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

Searching for what IPMI exactly is led me to the following statement: https://www.intel.com/content/www/us/en/products/docs/servers/ipmi/ipmi-home.html

Are there still significant amount of systems that are IPMI capable? Or should this implementation be replaced as advised on the site of intel? e.g. iLO / iDRAC

What's the impact for the trinity setup without IPMI in case of a virtual cluster?

Thanks in advance!

msteggink commented 10 months ago

The IPMI plugin in telegraf is enabled by default. Most, or nearly all BMC's are IPMI compatible (iDRAC/iLO/IPMI). Redfish is indeed the future, but IPMI should be sufficient to gather system metrics etc. etc.

xdkreij commented 10 months ago

Much appreciate your swift feedback, it's helpful! I'll close the question. thanks