AnalogJ / scrutiny

Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds
MIT License
5.29k stars 169 forks source link

Support MegaRAID controller #30

Closed logaritmisk closed 4 years ago

logaritmisk commented 4 years ago

I have a PowerEdge T320, with a PERC H710 controller. When I try to collect smart data it fails with this message: "smartctl returned an error code (4) while processing sda"

smartctl has support for MegaRAID with the command: "smartctl -d megaraid,0 -i /dev/sda" for disk 0, "smartctl -d megaraid,1 -i /dev/sda" for disk 1, and so on.

Would be awesome if support for this could be added to Scrutiny :)

AnalogJ commented 4 years ago

Hey @logaritmisk Can you give me the JSON output from those runs so I can verify that Scrutiny can handle them, and add them to the test suite?

smartctl -j -d megaraid,0 -i /dev/sda
smartctl -j -d megaraid,1 -i /dev/sda

Thanks!

logaritmisk commented 4 years ago
Slot: **0**

```json { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 1 ], "svn_revision": "5022", "platform_info": "x86_64-linux-5.4.0-42-generic", "build_info": "(local build)", "argv": [ "smartctl", "-j", "-d", "megaraid,0", "-i", "/dev/sda" ], "exit_status": 0 }, "device": { "name": "/dev/sda", "info_name": "/dev/sda [megaraid_disk_00] [SAT]", "type": "sat+megaraid,0", "protocol": "ATA" }, "model_name": "WD4000FYYX", "serial_number": "XXXXXXXXXXXX", "wwn": { "naa": 5, "oui": 5358, "id": 10217451239 }, "ata_additional_product_id": "DELL(tm)", "firmware_version": "00.0D1K4", "user_capacity": { "blocks": 7814037168, "bytes": 4000787030016 }, "logical_block_size": 512, "physical_block_size": 512, "rotation_rate": 7200, "form_factor": { "ata_value": 2, "name": "3.5 inches" }, "in_smartctl_database": false, "ata_version": { "string": "ATA8-ACS T13/1699-D revision 6", "major_value": 510, "minor_value": 40 }, "sata_version": { "string": "SATA 3.0", "value": 62 }, "interface_speed": { "max": { "sata_value": 6, "string": "3.0 Gb/s", "units_per_second": 30, "bits_per_unit": 100000000 }, "current": { "sata_value": 2, "string": "3.0 Gb/s", "units_per_second": 30, "bits_per_unit": 100000000 } }, "local_time": { "time_t": 1598293227, "asctime": "Mon Aug 24 20:20:27 2020 CEST" } } ```

Slot: **1**

```json { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 1 ], "svn_revision": "5022", "platform_info": "x86_64-linux-5.4.0-42-generic", "build_info": "(local build)", "argv": [ "smartctl", "-j", "-d", "megaraid,1", "-i", "/dev/sda" ], "exit_status": 0 }, "device": { "name": "/dev/sda", "info_name": "/dev/sda [megaraid_disk_01] [SAT]", "type": "sat+megaraid,1", "protocol": "ATA" }, "model_name": "WD4000FYYX", "serial_number": "XXXXXXXXXXXX", "wwn": { "naa": 5, "oui": 5358, "id": 11649125727 }, "ata_additional_product_id": "DELL(tm)", "firmware_version": "00.0D1K4", "user_capacity": { "blocks": 7814037168, "bytes": 4000787030016 }, "logical_block_size": 512, "physical_block_size": 512, "rotation_rate": 7200, "form_factor": { "ata_value": 2, "name": "3.5 inches" }, "in_smartctl_database": false, "ata_version": { "string": "ATA8-ACS T13/1699-D revision 6", "major_value": 510, "minor_value": 40 }, "sata_version": { "string": "SATA 3.0", "value": 62 }, "interface_speed": { "max": { "sata_value": 6, "string": "3.0 Gb/s", "units_per_second": 30, "bits_per_unit": 100000000 }, "current": { "sata_value": 2, "string": "3.0 Gb/s", "units_per_second": 30, "bits_per_unit": 100000000 } }, "local_time": { "time_t": 1598293334, "asctime": "Mon Aug 24 20:22:14 2020 CEST" } } ```

It seems I can get all my drives in the raid by running smartctl --scan. This is the output I get.

/dev/sda -d scsi # /dev/sda, SCSI device
/dev/bus/0 -d megaraid,0 # /dev/bus/0 [megaraid_disk_00], SCSI device
/dev/bus/0 -d megaraid,1 # /dev/bus/0 [megaraid_disk_01], SCSI device
/dev/bus/0 -d megaraid,2 # /dev/bus/0 [megaraid_disk_02], SCSI device
/dev/bus/0 -d megaraid,3 # /dev/bus/0 [megaraid_disk_03], SCSI device

Do you want info on the last two slots as well @AnalogJ?

AnalogJ commented 4 years ago

Hey @logaritmisk

Apologies, I forgot to include the --all/-a flag in the commands I gave you above

can you give me the output of

smartctl -a -j -d megaraid,0 -i /dev/sda
smartctl -a -j -d megaraid,1 -i /dev/sda

I think the output from 2 different devices is enough for me to get a feel for how megaraid device types differ from ATA/SCSI/NVMe

Thanks again!

logaritmisk commented 4 years ago

No problem :)

Slot: **0**

```json { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 1 ], "svn_revision": "5022", "platform_info": "x86_64-linux-5.4.0-42-generic", "build_info": "(local build)", "argv": [ "smartctl", "-a", "-j", "-d", "megaraid,0", "-i", "/dev/sda" ], "messages": [ { "string": "Warning: This result is based on an Attribute check.", "severity": "warning" } ], "exit_status": 4 }, "device": { "name": "/dev/sda", "info_name": "/dev/sda [megaraid_disk_00] [SAT]", "type": "sat+megaraid,0", "protocol": "ATA" }, "model_name": "WD4000FYYX", "serial_number": "XXXXXXXXXXXX", "wwn": { "naa": 5, "oui": 5358, "id": 10217451239 }, "ata_additional_product_id": "DELL(tm)", "firmware_version": "00.0D1K4", "user_capacity": { "blocks": 7814037168, "bytes": 4000787030016 }, "logical_block_size": 512, "physical_block_size": 512, "rotation_rate": 7200, "form_factor": { "ata_value": 2, "name": "3.5 inches" }, "in_smartctl_database": false, "ata_version": { "string": "ATA8-ACS T13/1699-D revision 6", "major_value": 510, "minor_value": 40 }, "sata_version": { "string": "SATA 3.0", "value": 62 }, "interface_speed": { "max": { "sata_value": 6, "string": "3.0 Gb/s", "units_per_second": 30, "bits_per_unit": 100000000 }, "current": { "sata_value": 2, "string": "3.0 Gb/s", "units_per_second": 30, "bits_per_unit": 100000000 } }, "local_time": { "time_t": 1598297918, "asctime": "Mon Aug 24 21:38:38 2020 CEST" }, "smart_status": { "passed": true }, "ata_smart_data": { "offline_data_collection": { "status": { "value": 130, "string": "was completed without error", "passed": true }, "completion_seconds": 90 }, "self_test": { "status": { "value": 0, "string": "completed without error", "passed": true }, "polling_minutes": { "short": 2, "extended": 523, "conveyance": 5 } }, "capabilities": { "values": [ 123, 3 ], "exec_offline_immediate_supported": true, "offline_is_aborted_upon_new_cmd": false, "offline_surface_scan_supported": true, "self_tests_supported": true, "conveyance_self_test_supported": true, "selective_self_test_supported": true, "attribute_autosave_enabled": true, "error_logging_supported": true, "gp_logging_supported": true } }, "ata_sct_capabilities": { "value": 28861, "error_recovery_control_supported": true, "feature_control_supported": true, "data_table_supported": true }, "ata_smart_attributes": { "revision": 16, "table": [ { "id": 1, "name": "Raw_Read_Error_Rate", "value": 200, "worst": 197, "thresh": 51, "when_failed": "", "flags": { "value": 47, "string": "POSR-K ", "prefailure": true, "updated_online": true, "performance": true, "error_rate": true, "event_count": false, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 3, "name": "Spin_Up_Time", "value": 228, "worst": 227, "thresh": 21, "when_failed": "", "flags": { "value": 39, "string": "POS--K ", "prefailure": true, "updated_online": true, "performance": true, "error_rate": false, "event_count": false, "auto_keep": true }, "raw": { "value": 7558, "string": "7558" } }, { "id": 4, "name": "Start_Stop_Count", "value": 100, "worst": 100, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 70, "string": "70" } }, { "id": 5, "name": "Reallocated_Sector_Ct", "value": 200, "worst": 200, "thresh": 140, "when_failed": "", "flags": { "value": 51, "string": "PO--CK ", "prefailure": true, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 7, "name": "Seek_Error_Rate", "value": 200, "worst": 200, "thresh": 0, "when_failed": "", "flags": { "value": 46, "string": "-OSR-K ", "prefailure": false, "updated_online": true, "performance": true, "error_rate": true, "event_count": false, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 9, "name": "Power_On_Hours", "value": 49, "worst": 49, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 37787, "string": "37787" } }, { "id": 10, "name": "Spin_Retry_Count", "value": 100, "worst": 253, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 11, "name": "Calibration_Retry_Count", "value": 100, "worst": 253, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 12, "name": "Power_Cycle_Count", "value": 100, "worst": 100, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 70, "string": "70" } }, { "id": 183, "name": "Runtime_Bad_Block", "value": 100, "worst": 100, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 192, "name": "Power-Off_Retract_Count", "value": 200, "worst": 200, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 55, "string": "55" } }, { "id": 193, "name": "Load_Cycle_Count", "value": 197, "worst": 197, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 9267, "string": "9267" } }, { "id": 194, "name": "Temperature_Celsius", "value": 116, "worst": 104, "thresh": 0, "when_failed": "", "flags": { "value": 34, "string": "-O---K ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": false, "auto_keep": true }, "raw": { "value": 3145764, "string": "36 (Min/Max 0/48)" } }, { "id": 196, "name": "Reallocated_Event_Count", "value": 200, "worst": 200, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 197, "name": "Current_Pending_Sector", "value": 200, "worst": 200, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 198, "name": "Offline_Uncorrectable", "value": 200, "worst": 200, "thresh": 0, "when_failed": "", "flags": { "value": 48, "string": "----CK ", "prefailure": false, "updated_online": false, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 199, "name": "UDMA_CRC_Error_Count", "value": 200, "worst": 200, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 200, "name": "Multi_Zone_Error_Rate", "value": 200, "worst": 200, "thresh": 0, "when_failed": "", "flags": { "value": 8, "string": "---R-- ", "prefailure": false, "updated_online": false, "performance": false, "error_rate": true, "event_count": false, "auto_keep": false }, "raw": { "value": 0, "string": "0" } }, { "id": 241, "name": "Total_LBAs_Written", "value": 198, "worst": 198, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 2754608750246, "string": "2754608750246" } }, { "id": 242, "name": "Total_LBAs_Read", "value": 200, "worst": 200, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 70057180117, "string": "70057180117" } } ] }, "power_on_time": { "hours": 37787 }, "power_cycle_count": 70, "temperature": { "current": 36 }, "ata_smart_error_log": { "summary": { "revision": 1, "count": 0 } }, "ata_smart_self_test_log": { "standard": { "revision": 1, "table": [ { "type": { "value": 1, "string": "Short offline" }, "status": { "value": 0, "string": "Completed without error", "passed": true }, "lifetime_hours": 35990 }, { "type": { "value": 1, "string": "Short offline" }, "status": { "value": 0, "string": "Completed without error", "passed": true }, "lifetime_hours": 3 }, { "type": { "value": 223, "string": "Vendor (0xdf)" }, "status": { "value": 0, "string": "Completed without error", "passed": true }, "lifetime_hours": 3 }, { "type": { "value": 1, "string": "Short offline" }, "status": { "value": 0, "string": "Completed without error", "passed": true }, "lifetime_hours": 1 } ], "count": 4, "error_count_total": 0, "error_count_outdated": 0 } }, "ata_smart_selective_self_test_log": { "revision": 1, "table": [ { "lba_min": 0, "lba_max": 0, "status": { "value": 0, "string": "Not_testing" } }, { "lba_min": 0, "lba_max": 0, "status": { "value": 0, "string": "Not_testing" } }, { "lba_min": 0, "lba_max": 0, "status": { "value": 0, "string": "Not_testing" } }, { "lba_min": 0, "lba_max": 0, "status": { "value": 0, "string": "Not_testing" } }, { "lba_min": 0, "lba_max": 0, "status": { "value": 0, "string": "Not_testing" } } ], "flags": { "value": 0, "remainder_scan_enabled": false }, "power_up_scan_resume_minutes": 0 } } ```

Slot: **1**

```json { "json_format_version": [ 1, 0 ], "smartctl": { "version": [ 7, 1 ], "svn_revision": "5022", "platform_info": "x86_64-linux-5.4.0-42-generic", "build_info": "(local build)", "argv": [ "smartctl", "-a", "-j", "-d", "megaraid,1", "-i", "/dev/sda" ], "messages": [ { "string": "Warning: This result is based on an Attribute check.", "severity": "warning" } ], "exit_status": 4 }, "device": { "name": "/dev/sda", "info_name": "/dev/sda [megaraid_disk_01] [SAT]", "type": "sat+megaraid,1", "protocol": "ATA" }, "model_name": "WD4000FYYX", "serial_number": "XXXXXXXXXXXX", "wwn": { "naa": 5, "oui": 5358, "id": 11649125727 }, "ata_additional_product_id": "DELL(tm)", "firmware_version": "00.0D1K4", "user_capacity": { "blocks": 7814037168, "bytes": 4000787030016 }, "logical_block_size": 512, "physical_block_size": 512, "rotation_rate": 7200, "form_factor": { "ata_value": 2, "name": "3.5 inches" }, "in_smartctl_database": false, "ata_version": { "string": "ATA8-ACS T13/1699-D revision 6", "major_value": 510, "minor_value": 40 }, "sata_version": { "string": "SATA 3.0", "value": 62 }, "interface_speed": { "max": { "sata_value": 6, "string": "3.0 Gb/s", "units_per_second": 30, "bits_per_unit": 100000000 }, "current": { "sata_value": 2, "string": "3.0 Gb/s", "units_per_second": 30, "bits_per_unit": 100000000 } }, "local_time": { "time_t": 1598297922, "asctime": "Mon Aug 24 21:38:42 2020 CEST" }, "smart_status": { "passed": true }, "ata_smart_data": { "offline_data_collection": { "status": { "value": 130, "string": "was completed without error", "passed": true }, "completion_seconds": 90 }, "self_test": { "status": { "value": 0, "string": "completed without error", "passed": true }, "polling_minutes": { "short": 2, "extended": 503, "conveyance": 5 } }, "capabilities": { "values": [ 123, 3 ], "exec_offline_immediate_supported": true, "offline_is_aborted_upon_new_cmd": false, "offline_surface_scan_supported": true, "self_tests_supported": true, "conveyance_self_test_supported": true, "selective_self_test_supported": true, "attribute_autosave_enabled": true, "error_logging_supported": true, "gp_logging_supported": true } }, "ata_sct_capabilities": { "value": 28861, "error_recovery_control_supported": true, "feature_control_supported": true, "data_table_supported": true }, "ata_smart_attributes": { "revision": 16, "table": [ { "id": 1, "name": "Raw_Read_Error_Rate", "value": 200, "worst": 111, "thresh": 51, "when_failed": "", "flags": { "value": 47, "string": "POSR-K ", "prefailure": true, "updated_online": true, "performance": true, "error_rate": true, "event_count": false, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 3, "name": "Spin_Up_Time", "value": 230, "worst": 227, "thresh": 21, "when_failed": "", "flags": { "value": 39, "string": "POS--K ", "prefailure": true, "updated_online": true, "performance": true, "error_rate": false, "event_count": false, "auto_keep": true }, "raw": { "value": 7458, "string": "7458" } }, { "id": 4, "name": "Start_Stop_Count", "value": 100, "worst": 100, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 68, "string": "68" } }, { "id": 5, "name": "Reallocated_Sector_Ct", "value": 188, "worst": 188, "thresh": 140, "when_failed": "", "flags": { "value": 51, "string": "PO--CK ", "prefailure": true, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 387, "string": "387" } }, { "id": 7, "name": "Seek_Error_Rate", "value": 200, "worst": 200, "thresh": 0, "when_failed": "", "flags": { "value": 46, "string": "-OSR-K ", "prefailure": false, "updated_online": true, "performance": true, "error_rate": true, "event_count": false, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 9, "name": "Power_On_Hours", "value": 49, "worst": 49, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 37788, "string": "37788" } }, { "id": 10, "name": "Spin_Retry_Count", "value": 100, "worst": 253, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 11, "name": "Calibration_Retry_Count", "value": 100, "worst": 253, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 12, "name": "Power_Cycle_Count", "value": 100, "worst": 100, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 68, "string": "68" } }, { "id": 183, "name": "Runtime_Bad_Block", "value": 100, "worst": 100, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 192, "name": "Power-Off_Retract_Count", "value": 200, "worst": 200, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 56, "string": "56" } }, { "id": 193, "name": "Load_Cycle_Count", "value": 197, "worst": 197, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 9462, "string": "9462" } }, { "id": 194, "name": "Temperature_Celsius", "value": 116, "worst": 101, "thresh": 0, "when_failed": "", "flags": { "value": 34, "string": "-O---K ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": false, "auto_keep": true }, "raw": { "value": 3342372, "string": "36 (Min/Max 0/51)" } }, { "id": 196, "name": "Reallocated_Event_Count", "value": 191, "worst": 191, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 9, "string": "9" } }, { "id": 197, "name": "Current_Pending_Sector", "value": 200, "worst": 200, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 198, "name": "Offline_Uncorrectable", "value": 200, "worst": 200, "thresh": 0, "when_failed": "", "flags": { "value": 48, "string": "----CK ", "prefailure": false, "updated_online": false, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 199, "name": "UDMA_CRC_Error_Count", "value": 200, "worst": 200, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 0, "string": "0" } }, { "id": 200, "name": "Multi_Zone_Error_Rate", "value": 200, "worst": 199, "thresh": 0, "when_failed": "", "flags": { "value": 8, "string": "---R-- ", "prefailure": false, "updated_online": false, "performance": false, "error_rate": true, "event_count": false, "auto_keep": false }, "raw": { "value": 0, "string": "0" } }, { "id": 241, "name": "Total_LBAs_Written", "value": 197, "worst": 197, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 3920560799278, "string": "3920560799278" } }, { "id": 242, "name": "Total_LBAs_Read", "value": 200, "worst": 200, "thresh": 0, "when_failed": "", "flags": { "value": 50, "string": "-O--CK ", "prefailure": false, "updated_online": true, "performance": false, "error_rate": false, "event_count": true, "auto_keep": true }, "raw": { "value": 72684827907, "string": "72684827907" } } ] }, "power_on_time": { "hours": 37788 }, "power_cycle_count": 68, "temperature": { "current": 36 }, "ata_smart_error_log": { "summary": { "revision": 1, "count": 0 } }, "ata_smart_self_test_log": { "standard": { "revision": 1, "table": [ { "type": { "value": 1, "string": "Short offline" }, "status": { "value": 25, "string": "Aborted by host", "remaining_percent": 90 }, "lifetime_hours": 35990 }, { "type": { "value": 1, "string": "Short offline" }, "status": { "value": 0, "string": "Completed without error", "passed": true }, "lifetime_hours": 35990 }, { "type": { "value": 1, "string": "Short offline" }, "status": { "value": 0, "string": "Completed without error", "passed": true }, "lifetime_hours": 3 }, { "type": { "value": 223, "string": "Vendor (0xdf)" }, "status": { "value": 0, "string": "Completed without error", "passed": true }, "lifetime_hours": 3 }, { "type": { "value": 1, "string": "Short offline" }, "status": { "value": 0, "string": "Completed without error", "passed": true }, "lifetime_hours": 1 } ], "count": 5, "error_count_total": 0, "error_count_outdated": 0 } }, "ata_smart_selective_self_test_log": { "revision": 1, "table": [ { "lba_min": 0, "lba_max": 0, "status": { "value": 0, "string": "Not_testing" } }, { "lba_min": 0, "lba_max": 0, "status": { "value": 0, "string": "Not_testing" } }, { "lba_min": 0, "lba_max": 0, "status": { "value": 0, "string": "Not_testing" } }, { "lba_min": 0, "lba_max": 0, "status": { "value": 0, "string": "Not_testing" } }, { "lba_min": 0, "lba_max": 0, "status": { "value": 0, "string": "Not_testing" } } ], "flags": { "value": 0, "remainder_scan_enabled": false }, "power_up_scan_resume_minutes": 0 } } ```

AnalogJ commented 4 years ago

Hey @logaritmisk

I'm starting work on this issue. Can you run the following command and paste the result here?

smartctl -j --scan

Thanks!

logaritmisk commented 4 years ago

Sure @AnalogJ

Without sudo:

{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      1
    ],
    "svn_revision": "5022",
    "platform_info": "x86_64-linux-5.4.0-45-generic",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "-j",
      "--scan"
    ],
    "exit_status": 0
  },
  "devices": [
    {
      "name": "/dev/sda",
      "info_name": "/dev/sda",
      "type": "scsi",
      "protocol": "SCSI"
    }
  ]
}

With sudo:

{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      1
    ],
    "svn_revision": "5022",
    "platform_info": "x86_64-linux-5.4.0-45-generic",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "-j",
      "--scan"
    ],
    "exit_status": 0
  },
  "devices": [
    {
      "name": "/dev/sda",
      "info_name": "/dev/sda",
      "type": "scsi",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/bus/0",
      "info_name": "/dev/bus/0 [megaraid_disk_00]",
      "type": "megaraid,0",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/bus/0",
      "info_name": "/dev/bus/0 [megaraid_disk_01]",
      "type": "megaraid,1",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/bus/0",
      "info_name": "/dev/bus/0 [megaraid_disk_02]",
      "type": "megaraid,2",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/bus/0",
      "info_name": "/dev/bus/0 [megaraid_disk_03]",
      "type": "megaraid,3",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/bus/0",
      "info_name": "/dev/bus/0 [megaraid_disk_04]",
      "type": "megaraid,4",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/bus/0",
      "info_name": "/dev/bus/0 [megaraid_disk_05]",
      "type": "megaraid,5",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/bus/0",
      "info_name": "/dev/bus/0 [megaraid_disk_06]",
      "type": "megaraid,6",
      "protocol": "SCSI"
    },
    {
      "name": "/dev/bus/0",
      "info_name": "/dev/bus/0 [megaraid_disk_07]",
      "type": "megaraid,7",
      "protocol": "SCSI"
    }
  ]
}
AnalogJ commented 4 years ago

Hey @logaritmisk

I think I finally got this working. It took a while because I had to change the way that device detection worked, and smartctl --scan returns junk data in some cases.

My work is still in a branch but I have a docker image available for it.

Can you try running the following command, and tell me if your raid disks are correctly detected?

docker run --rm -p 8080:8080 \
-v /dev/disk:/dev/disk \
-v /run/udev:/run/udev:ro \
--name scrutiny \
--privileged analogj/scrutiny:smartctl_scan

You'll need to exec into the container and start the collector manually too:

docker exec scrutiny scrutiny-collector-metrics run
logaritmisk commented 4 years ago

Hi @AnalogJ,

I have tested it on one of my servers, and it works! But it lists the virtual disk that the raid creates as well. In my case it's /dev/sda.

Screen Shot 2020-09-19 at 10 40 02

Another thing that would be nice is to see in what slot each disk is. Right now all I can see is that they belong to /dev/bus/0.

AnalogJ commented 4 years ago

That's great! Regarding the virtual drive, I wasn't sure how to detect it using smartctl. Can you run the following command and paste the logs here? docker exec scrutiny smartctl -x -j /dev/sda

The slot detection is also a bit messy. Would something like this be ok?

Screen Shot 2020-09-19 at 10 10 16 AM
logaritmisk commented 4 years ago
{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      0
    ],
    "svn_revision": "4883",
    "platform_info": "x86_64-linux-5.4.0-47-generic",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "-x",
      "-j",
      "/dev/sda"
    ],
    "exit_status": 4
  },
  "device": {
    "name": "/dev/sda",
    "info_name": "/dev/sda",
    "type": "scsi",
    "protocol": "SCSI"
  },
  "vendor": "DELL",
  "product": "PERC H710",
  "model_name": "DELL PERC H710",
  "revision": "3.13",
  "scsi_version": "SPC-3",
  "user_capacity": {
    "blocks": 54690578432,
    "bytes": 28001576157184
  },
  "logical_block_size": 512,
  "serial_number": "00deb6bb1XXXdXXXXXXX",
  "device_type": {
    "scsi_value": 0,
    "name": "disk"
  },
  "local_time": {
    "time_t": 1600535908,
    "asctime": "Sat Sep 19 17:18:28 2020 UTC"
  },
  "temperature": {
    "current": 0,
    "drive_trip": 0
  }
}

I think you can see that the drives in my raid is "mapped" to /dev/sda. When I check the output I sent you from one of the drives in the raid, there is this section

"device": {
    "name": "/dev/sda",
    "info_name": "/dev/sda [megaraid_disk_00] [SAT]",
    "type": "sat+megaraid,0",
    "protocol": "ATA"
  }

it would be interesting to see how it would look if there was multiple virtual drives. Unfortunately I can't change my setup :/

Regarding the slot info, the example you sent would work for me. As long as I can see which disk is about to fail I'm happy :)

AnalogJ commented 4 years ago

Yeah, the only issue is that scrutiny processes each detected device in parallel, so comparing the "device" section between drives becomes a bit more difficult. I can try using a library called jaypipes/ghw to filter out devices connected to a virtual storage controller, but that code will be linux specific. Probably not an issue since Windows isn't officially supported yet.

Let me do play around and get back to you. Thanks again for being so patient, I don't have a Broadcom raid controller, so you're help is invaluable 😄

The updated device titles should be available already in the analogj/scrutiny:smartctl_scan docker image.