cloudandheat / prometheus_smart_exporter

Configurable S.M.A.R.T. metric exporter for Prometheus
GNU General Public License v3.0
54 stars 21 forks source link

struct.error: unpack requires a buffer of 9 bytes #11

Open nableru opened 4 years ago

nableru commented 4 years ago

try to run: # /usr/local/bin/prometheus_smart_exporter --device-db /etc/prometheus/exporters/smart/devices.json -a 0.0.0.0 -p 9257 -vv /var/run/prometheus_smart_helper/ipc

INFO:prometheus_smart_exporter:device db loaded with 4 devices INFO:prometheus_smart_exporter:attribute_mapping loaded with 16 generic rules, 0 per device rules for 0 devices Traceback (most recent call last): File "/usr/local/bin/prometheus_smart_exporter", line 10, in sys.exit(main()) File "/usr/local/lib/python3.6/dist-packages/prometheus_smart_exporter/init.py", line 402, in main logger.getChild("collector") File "/usr/local/lib/python3.6/dist-packages/prometheus_client/registry.py", line 24, in register names = self._get_names(collector) File "/usr/local/lib/python3.6/dist-packages/prometheus_client/registry.py", line 64, in _get_names for metric in desc_func(): File "/usr/local/lib/python3.6/dist-packages/prometheus_smart_exporter/init.py", line 105, in collect data = self._recv_smart_info(sock) File "/usr/local/lib/python3.6/dist-packages/prometheus_smart_exporter/init.py", line 82, in _recv_smart_info ver, length = Header.unpack(hdr) struct.error: unpack requires a buffer of 9 bytes

/usr/bin/python3 -V Python 3.6.8

uname -a Linux backup 4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

ls -la /var/run/prometheus_smart_helper/ipc srw------- 1 prometheus prometheus 0 Sep 24 11:01 /var/run/prometheus_smart_helper/ipc

RaptahJezus commented 4 years ago

I can confirm identical behavior on Python 3.6.9. Smartmontools v7.0. prometheus_node_exporter installed via pip3. I attempted to chop the device.json down in case that was causing it.

# /usr/local/bin/prometheus_smart_exporter -vvv --device-db /etc/prometheus_smart_exporter/devices2.js DEBUG:devdb:interpreting {'Device': ['Intel 320 Series SSDs', 'INTEL SSDSA2CW160G3', 'INTEL SSDSA2CT040G3'], 'ID#': {'5': 'RAW_VALUE', '9': 'RAW_VALUE', '12': 'RAW_VALUE', '171': 'VALUE', '172': 'VALUE', '183': 'RAW_VALUE', '184': 'VALUE', '187': 'RAW_VALUE', '192': 'RAW_VALUE', '199': 'RAW_VALUE', '226': 'RAW_VALUE', '227': 'RAW_VALUE', '228': 'RAW_VALUE', '232': 'VALUE', '233': 'VALUE', '241': 'RAW_VALUE', '242': 'RAW_VALUE', '1024': 'VALUE'}, 'Threshs': {'5': ['20', '40'], '171': ['16:', '11:'], '172': ['16:', '11:'], '184': ['96:', '91:'], '187': ['0', '10'], '199': ['0', '10'], '232': ['16:', '11:'], '233': ['16:', '6:'], '1024': ['0', '10']}, 'Perfs': ['233', '241', '242']} DEBUG:devdb:found ID#={5: 'Raw', 9: 'Raw', 12: 'Raw', 171: 'Value', 172: 'Value', 183: 'Raw', 184: 'Value', 187: 'Raw', 192: 'Raw', 199: 'Raw', 226: 'Raw', 227: 'Raw', 228: 'Raw', 232: 'Value', 233: 'Value', 241: 'Raw', 242: 'Raw', 1024: 'Value'} DEBUG:devdb:found Threshs={5: ('20', '40'), 171: ('16:', '11:'), 172: ('16:', '11:'), 184: ('96:', '91:'), 187: ('0', '10'), 199: ('0', '10'), 232: ('16:', '11:'), 233: ('16:', '6:'), 1024: ('0', '10')} DEBUG:devdb:found Perfs={233, 242, 241} DEBUG:devdb:updating 'Intel 320 Series SSDs' with said info DEBUG:devdb:updating 'INTEL SSDSA2CW160G3' with said info DEBUG:devdb:updating 'INTEL SSDSA2CT040G3' with said info INFO:prometheus_smart_exporter:device db loaded with 3 devices DEBUG:prometheus_smart_exporter:no --attr-mapping specified, using default PosixPath('/usr/local/lib/python3.6/dist-packages/prometheus_smart_exporter/data/attrmap.json') DEBUG:attrmap:loading generic rules DEBUG:attrmap:interpreting rule {'id': 5, 'name': 'reallocated_sectors', 'type': 'counter'} DEBUG:attrmap:interpreting rule {'id': 10, 'name': 'spin_retries_total', 'type': 'counter'} DEBUG:attrmap:interpreting rule {'id': 183, 'match': '^.+Bad.+$', 'name': 'runtime_bad_blocks_total', 'type': 'counter'} DEBUG:attrmap:interpreting rule {'id': 184, 'name': 'end_to_end_errors_total', 'type': 'counter'} DEBUG:attrmap:interpreting rule {'id': 187, 'name': 'uncorrectable_errors_total', 'type': 'counter'} DEBUG:attrmap:interpreting rule {'id': 188, 'name': 'command_timeouts_total', 'type': 'counter'} DEBUG:attrmap:interpreting rule {'id': 190, 'name': 'case_temperature_celsius', 'type': 'gauge'} DEBUG:attrmap:interpreting rule {'id': 194, 'name': 'temperature_celsius', 'type': 'gauge'} DEBUG:attrmap:interpreting rule {'id': 196, 'name': 'reallocation_events_total', 'type': 'counter'} DEBUG:attrmap:interpreting rule {'id': 197, 'name': 'pending_sectors', 'type': 'gauge'} DEBUG:attrmap:interpreting rule {'id': 198, 'name': 'uncorrectable_sectors', 'type': 'gauge'} DEBUG:attrmap:interpreting rule {'id': 201, 'name': 'soft_read_error_rate', 'type': 'gauge'} DEBUG:attrmap:interpreting rule {'id': 230, 'name': 'drive_life_protection_status', 'type': 'gauge'} DEBUG:attrmap:interpreting rule {'id': 233, 'name': 'wearout_percent', 'type': 'gauge'} DEBUG:attrmap:interpreting rule {'id': 241, 'match': 'Total_LBAs_Written', 'name': 'written_lbas_total', 'type': 'counter'} DEBUG:attrmap:interpreting rule {'id': 242, 'match': 'Total_LBAs_Read', 'name': 'read_lbas_total', 'type': 'counter'} DEBUG:attrmap:finished INFO:prometheus_smart_exporter:attribute_mapping loaded with 16 generic rules, 0 per device rules for 0 devices DEBUG:prometheus_smart_exporter.collector:starting collection ... DEBUG:prometheus_smart_exporter.collector:attempting UNIX connection to /var/run/prometheus_smart_helper/ipc Traceback (most recent call last): File "/usr/local/bin/prometheus_smart_exporter", line 11, in sys.exit(main()) File "/usr/local/lib/python3.6/dist-packages/prometheus_smart_exporter/init.py", line 402, in main logger.getChild("collector") File "/usr/local/lib/python3.6/dist-packages/prometheus_client/registry.py", line 24, in register names = self._get_names(collector) File "/usr/local/lib/python3.6/dist-packages/prometheus_client/registry.py", line 64, in _get_names for metric in desc_func(): File "/usr/local/lib/python3.6/dist-packages/prometheus_smart_exporter/init.py", line 105, in collect data = self._recv_smart_info(sock) File "/usr/local/lib/python3.6/dist-packages/prometheus_smart_exporter/init.py", line 82, in _recv_smart_info ver, length = Header.unpack(hdr) struct.error: unpack requires a buffer of 9 bytes

I ran the helper in another console with -vvv, and this is what I got:

# /usr/local/bin/smart_exporter_helper -vvv --socket-path /var/run/prometheus_smart_helper/ipc ERROR:smart_exporter_helper:while handling client Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/smart_exporter_helper/init.py", line 249, in main handle_client(client_sock) File "/usr/local/lib/python3.6/dist-packages/smart_exporter_helper/init.py", line 142, in handle_client info = read_drive_info("/dev/"+device) File "/usr/local/lib/python3.6/dist-packages/smart_exporter_helper/init.py", line 112, in read_drive_info "Raw": int(fields[9]), ValueError: invalid literal for int() with base 10: '48604h+24m+02.151s'

That error message pops up right as I attempt to launch the exporter.

I notice that the drive in question that's crashing this is a Seagate ST3000DM001. For some unknown reason it returns the following when running smartctl -iA

240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 48604h+28m+05.655s

None of my other drives return a value for attribute 240.

Looking through the git repo, I see there was a sanity check added on fix #3 that verifies whether the raw values are actual integers or not. I updated usr/local/lib/python3.6/dist-packages/smart_exporter_helper/__init__.py lines 113 and 114 to be the following, which solved the problem.

"Thresh": int(fields[5]) if fields[5].isdigit() else None, "Raw": int(fields[9]) if fields[9].isdigit() else None,

These checks were not present on the file I installed via pip3.