gebn / bmc_exporter

Exposes Baseboard Management Controller data in Prometheus format.
GNU Lesser General Public License v3.0
47 stars 3 forks source link

bmc_up 0 even though IPMI reachable with ipmitool #60

Closed loelkes closed 3 years ago

loelkes commented 3 years ago

Hi,

I followed the Getting Started steps but still can't get it to work.

If i run the ipmitool command with the username and password I get all the readings. The user bmc_exporter has USER privileges.

$ ipmitool -I lanplus -H ipmi.example.com -U bmc_exporter -P supersecret -L USER sdr
Watchdog         | 0x00              | ok
SEL              | 0x00              | ok
...

If I run the exporter, scraping takes about 8s (it does something in the background) but then says bmc_up 0. Swapping DNS with the IP does not change anything.

# HELP bmc_scrape_duration_seconds The time taken to collect all metrics, measured by the exporter.
# TYPE bmc_scrape_duration_seconds gauge
bmc_scrape_duration_seconds 8.57588601
# HELP bmc_up 1 if the exporter was able to establish a session, 0 otherwise.
# TYPE bmc_up gauge
bmc_up 0

There are no error messages. Any advice and what to look for?

gebn commented 3 years ago

Thanks for the report. The 8s duration is suspicious; it suggests the BMC is ignoring the library rather than returning an error. What brand of BMC is this? As an initial step, try running the describe command here. We're expecting it to fail, however the error output is more verbose. The exporter is silent to avoid log spam when scraping tens of thousands of BMCs repeatedly; a -v option has been on the backlog for some time.

loelkes commented 3 years ago

As an initial step, try running the describe command here.

I don't have any experience with Go (yet), I have no idea how to proceed. Please advise.

BMC Brand (?)

MegaRAC SP-X
BMC Firmware Information | 12.53.07 | May 27 2021 17:24:31 CST
gebn commented 3 years ago

Here's an amd64 binary. To run it:

$ gunzip describe.gz
$ chmod +x describe
$ ./describe --username bmc_exporter --password supersecret ipmi.example.com

We're expecting errors, however the output should start with something like this:

2021/07/14 00:12:19 connected to 10.22.4.227:623 over IPMI v2.0
ASF Presence Pong capabilities:
        IPMI:               true
        ASF v1.0:           true
        ASF security exts:  false
        DASH:               false
        DCMI:               false
Channel Authentication Capabilities:
        Channel:            0x1(Implementation-specific)
        Extended:           true
        SupportsV2:         true
        K_G configured:     false
        Per-message auth:   false
        User-level auth:    false
        Non-null usernames: true
        Null usernames:     false
        Anon login:         false
        OEM:                21317(ATEN INTERNATIONAL CO., LTD.)
System:
        GUID:               43303031-4d53-0cc4-7a37-865f00000000
Device:
        ID:                 32
        Revision:           1
        Manufacturer:       10876(Super Micro Computer Inc.)
        Product:            2052
        Firmware (major):   3
        Firmware (minor):   86
        Firmware (aux):     00000000
        Firmware:           03.86
Chassis:
        Powered on:         false
        On power restore:   0(Remain off)
        Identification:     0(Off)
        Intrusion:          false
        Power fault:        false
        Cooling fault:      false
        Drive fault:        false
...
loelkes commented 3 years ago

Thanks. While double checking I realised I had set port 80 instead of 623. m( Setting it to 623 does not solve it but returns the following error in the bmc_exporter:

$ ./bmc_exporter --secrets.static secrets.yml
panic: inconsistent label cardinality: expected 1 label values but got 0 in []string(nil)

goroutine 70 [running]:
github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...)
        /home/loelkes/go/pkg/mod/github.com/prometheus/client_golang@v1.9.0/prometheus/value.go:107
github.com/gebn/bmc_exporter/bmc/subcollector.(*PowerDraw).Collect(0xc000528390, 0xba8650, 0xc0005101c0, 0xc00051a1e0, 0x0, 0x0)
        /srv/bmc_exporter/bmc/subcollector/power_draw.go:130 +0x4a9
github.com/gebn/bmc_exporter/bmc/collector.(*Collector).collect(0xc000528240, 0xba8650, 0xc0005101c0, 0xc00051a1e0, 0x0, 0x0)
        /srv/bmc_exporter/bmc/collector/collector.go:253 +0x2a5
github.com/gebn/bmc_exporter/bmc/collector.(*Collector).Collect(0xc000528240, 0xc00051a1e0)
        /srv/bmc_exporter/bmc/collector/collector.go:178 +0x11c
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
        /home/loelkes/go/pkg/mod/github.com/prometheus/client_golang@v1.9.0/prometheus/registry.go:446 +0x12b
created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather
        /home/loelkes/go/pkg/mod/github.com/prometheus/client_golang@v1.9.0/prometheus/registry.go:457 +0x5ce

Output from the describe command:

./describe --username bmc_exporter --password supersecret ipmi.example.com:623
2021/07/14 05:01:15 connected to x.x.x.x:623 over IPMI v2.0
ASF Presence Pong capabilities:
    IPMI:               true
    ASF v1.0:           true
    ASF security exts:  false
    DASH:               false
    DCMI:               true
Channel Authentication Capabilities:
    Channel:            0x1(Implementation-specific)
    Extended:           true
    SupportsV2:         true
    K_G configured:     false
    Per-message auth:   true
    User-level auth:    false
    Non-null usernames: true
    Null usernames:     false
    Anon login:         false
    OEM:                0(Unknown)
System:
    GUID:               ...
Device:
    ID:                 32
    Revision:           1
    Manufacturer:       15370(Unknown)
    Product:            4149
    Firmware (major):   12
    Firmware (minor):   53
    Firmware (aux):     07000000
    Firmware:           12.53
Chassis:
    Powered on:         true
    On power restore:   0(Remain off)
    Identification:     0(Off)
    Intrusion:          false
    Power fault:        false
    Cooling fault:      false
    Drive fault:        false
Sensors:
    12V_GPU3            0A
    12V_GPU2            0A
    12V_GPU0            2.56A

    12V_GPU1            2.56A
    12V_HDD             1.92A
    12V_FAN             1.6A
    12V_MB              8.96A
    P_12V_MB            11.904V
    P_12V_HDD           11.904V
    P_12V_FAN           11.904V
    P_12V_GPU3          11.904V
    P_12V_GPU2          11.904V
    P_12V_GPU0          11.904V
    P_12V_GPU1          11.904V
    CPU0_TEMP           36C
    DIMMG0_TEMP         28C
    DIMMG1_TEMP         31C
    CPU0_DTS            64C
    GPU3_PROC           no reading/missing (sensor reading is not available)
    GPU2_PROC           no reading/missing (sensor reading is not available)
    GPU0_PROC           38C
    GPU1_PROC           31C
    MB_TEMP1            34C
    MB_TEMP2            24C
    Inlet_Temp          24C
    NVMe0_TEMP          36C
    NVMe1_TEMP          no reading/missing (sensor reading is not available)
    BPB_FAN1            6150RPM
    BPB_FAN2            6150RPM
    BPB_FAN3            6300RPM
    BPB_FAN4            6150RPM
    BPB_FAN5            6150RPM
    PSU1_HOTSPOT        23C
    PSU2_HOTSPOT        32C
    P_12V               11.635V
    P_1V2               1.19V
    P0_VDD_18_DUAL      1.7738V
    P0_VDD_18           1.7836V
    P0_VPP_ABCD_SUS     2.3842000000000003V
    P0_VPP_EFGH_SUS     2.3449V
    P_3V3               3.2524V
    P_5V                4.9344V
    P_5V_STBY           4.9087000000000005V
    P_VBAT              3.0316V
    P0_VDDCR_SOC        0.595V
    P0_VDDCR_CPU        0.525V
    P1_VDDCR_CPU        no reading/missing (sensor reading is not available)
    P0_VDDIO_ABCD       1.197V
    P0_VDDIO_EFGH       1.197V
    SYS_POWER           250W
2021/07/14 05:01:25 failed to fetch DCMI supported capabilities: read udp x.x.x.x:47391->x.x.x.x:623: i/o timeout
2021/07/14 05:01:25 failed to fetch DCMI mandatory platform attrs: write udp x.x.x.x:47391->x.x.x.x:623: i/o timeout
2021/07/14 05:01:25 failed to fetch DCMI enhanced power stats attrs: write udp x.x.x.x:47391->x.x.x.x:623: i/o timeout
DCMI Capabilities:
    Major version:      0
    Minor version:      0
    Supports pwr mgmt:  false
DCMI Mandatory Platform Attributes:
    Max SEL entries:    0
    Temp sampling freq: 0s
DCMI Power Average Time Periods:
2021/07/14 05:01:25 failed to get DCMI sensor info: write udp x.x.x.x:47391->x.x.x.x:623: i/o timeout

The exporter returns:

$ curl localhost:9622/bmc?target=ipmi.example.com:623
curl: (52) Empty reply from server
gebn commented 3 years ago

Does this version fix it? bmc_exporter.gz

loelkes commented 3 years ago

Yes, it does. Thanks!

gebn commented 3 years ago

Pleased to hear, thanks for reporting this - it would've affected anyone using DCMI temperatures. Future releases will include the fix.