baruch / diskscan

Scan disk for bad or near failure sectors, performs disk diagnostics
GNU General Public License v3.0
108 stars 29 forks source link

Failed to read SMART attributes from device #55

Closed jsveiga closed 7 years ago

jsveiga commented 8 years ago

Hello,

I'm using diskscan 0.19 on kubuntu 16.04 (tried from the kubuntu 16.04 ,deb and also from the latest github release).

It seems to work ok, but cant 't read the SMART info from the disk (Failed to read SMART attributes from device).

smartctl can read the SMART data correctly (see below tests, two different disks).

The computer is an old Dell Precision 390, and the onboard SATA controller shows as: Intel Corporation NM10/ICH7 Family SATA Controller [IDE mode](rev 01)

If any testing is needed, please let me know. BR,

Joao

root@spyder:/home/jsveiga# smartctl -i /dev/sda smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-22-generic](local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION === Model Family: Western Digital RE3 Serial ATA Device Model: WDC WD2502ABYS-18B7A0 Serial Number: WD-WMAT16608175 LU WWN Device Id: 5 0014ee 1aeb0b518 Add. Product Id: DELL(tm) Firmware Version: 02.03B05 User Capacity: 250.000.000.000 bytes [250 GB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 2.5, 3.0 Gb/s Local Time is: Wed Jul 6 16:03:19 2016 BRT SMART support is: Available - device has SMART capability. SMART support is: Enabled

root@spyder:/home/jsveiga# diskscan /dev/sda diskscan version 0.19

I: Validating path /dev/sda E: Failed to read SMART attributes from device I: Opened disk /dev/sda sector size 512 num bytes 249999999488 I: Scanning disk /dev/sda in 65536 byte steps I: Scan started at: Wed Jul 6 16:03:35 2016

(I interrupted it here, but if left running, it will show the SMART error periodically).

root@spyder:/home/jsveiga# smartctl -i /dev/sdb smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-22-generic](local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION === Model Family: Toshiba 2.5" HDD MK..61GSY[N] Device Model: TOSHIBA MK5061GSYN Serial Number: X3RPTOBTT LU WWN Device Id: 5 000039 511c0a1a2 Firmware Version: MH000D User Capacity: 500.107.862.016 bytes [500 GB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Form Factor: 2.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s) Local Time is: Wed Jul 6 16:10:42 2016 BRT SMART support is: Available - device has SMART capability. SMART support is: Enabled

root@spyder:/home/jsveiga# diskscan /dev/sdb diskscan version 0.19

I: Validating path /dev/sdb E: Failed to read SMART attributes from device I: Opened disk /dev/sdb sector size 512 num bytes 500107861504 I: Scanning disk /dev/sdb in 65536 byte steps I: Scan started at: Wed Jul 6 16:11:03 2016

baruch commented 8 years ago

A few things that I need to get on track:

jsveiga commented 8 years ago

Sure, here are they:

smartctl.zip

strace.zip

BR,

Joao

deamen commented 7 years ago

Hi baruch, Have you got any progress on this one yet? I am having the same issue as well. Thanks

deamen commented 7 years ago

Hi baruch, It looks like the following lines are causing this issue: if (ata_get_ata_smart_read_data_version(buf) != 0x0010) return -1; If I comment them out in libscsicmd/src/ata.c, diskscan works fine.

baruch commented 7 years ago

Can you please compile the project in the libscsicmd directory and run the test/ata_smart_read_data utility? It will tell me what version it is seeing in there.

deamen commented 7 years ago

Hi baruch, The output says: Page version: 0001h

Can you explain a little bit about what this "Page version" is for, and why do you need to check it against 0x0010?

I guess it is meant to check against the "Data structure revision" in "Device Configuration Identify data structure" according to the ATA spec.

deamen commented 7 years ago

The full output:

sbp: 0x6100a0 status: 0 masked status: 0 driver status: 0 msg status: 0 host status: 0 sense len: 0 Response Dump:

00 01 00 00 00 00 00 00 00 Page checksum read: 00 Page checksum calc: 8F Page checksum matches: false Page version: 0001h Unknown page version, only known version is 10h

baruch commented 7 years ago

Other programs do not seem to validate this value, I'll remove this validation and let it slide.

dilworks commented 6 years ago

Same problem is happening to me on a fresh build on a 32-bit Debian Jessie (oldstable) setup.

The drive does support SMART and it's working OK:

$ sudo smartctl -a /dev/sdc
smartctl 6.4 2014-10-07 r4002 [i686-linux-4.9.0-0.bpo.5-686-pae] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.9
Device Model:     ST3160812AS
Serial Number:    5LSESH7P
Firmware Version: 3.ADJ
User Capacity:    160.041.885.696 bytes [160 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Sat Mar 24 16:17:25 2018 -04
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  54) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   093   006    Pre-fail  Always       -       153271031
  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       13
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   070   060   030    Pre-fail  Always       -       11970563
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       41
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       25
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   063   057   045    Old_age   Always       -       37 (Min/Max 26/37)
194 Temperature_Celsius     0x0022   037   043   000    Old_age   Always       -       37 (0 26 0 0 0)
195 Hardware_ECC_Recovered  0x001a   061   048   000    Old_age   Always       -       200020825
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%        38         -
# 2  Extended offline    Completed without error       00%        24         -
# 3  Extended offline    Completed without error       00%        17         -
# 4  Short offline       Completed without error       00%        14         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

...yet diskscan always complains:

$ sudo ./diskscan /dev/sdc
diskscan version 0.19

I: Validating path /dev/sdc
E: Failed to read SMART attributes from device
I: Opened disk /dev/sdc sector size 512 num bytes 160041885184
I: Scanning disk /dev/sdc in 65536 byte steps
I: Scan started at: Sat Mar 24 16:19:53 2018

Disk scan |=                                                                                                          | ETA: 0h37m33s
E: Failed to read SMART attributes from device
^Csk scan |=                                                                                                          | ETA: 0h37m12s
E: Failed to read SMART attributes from device
I: Disk scan interrupted
Disk scan |=                                                                                                          | ETA: 0h38m18s

Even worse, kernel really dislikes whatever diskscan is doing in the meanwhile, because I get several of those logged into the kernel log:

[ 1729.352476] ata3: drained 512 bytes to clear DRQ
[ 1729.352490] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 1729.352499] ata3.00: BMDMA stat 0x25
[ 1729.352506] ata3.00: failed command: SMART
[ 1729.352517] ata3.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 dma 512 in
         res 58/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x2 (HSM violation)
[ 1729.352523] ata3.00: status: { DRDY DRQ }
[ 1729.352539] ata3: soft resetting link
[ 1729.635735] ata3.00: configured for UDMA/133
[ 1729.635793] ata3: EH complete

Kernel version: Linux 4.9.0-0.bpo.5-686-pae #1 SMP Debian 4.9.65-3+deb9u2~bpo8+1 (2017-01-05) i686 GNU/Linux (fetched from jessie-backports)

I've even tried building and running ata_smart_read_data, but the output doesn't even look like it's talking to the drive:

:/dev$ sudo ata_smart_read_data sdc
CDB: a1 0c 0e d0 01 00 4f c2 00 b0 00 00 
status: 0
masked status: 0
driver status: 0
msg status: 0
host status: 0
sense len: 0
Response Dump:

00  00 00 00 00 
Page checksum read: 00
Page checksum calc: 00
Page checksum matches: true
Page version: 0000h
Unknown page version, only known version is 10h

I had to call the test program using relative device name due to something that looks to me like a bug on test/main.c commandline parsing:

int main(int argc, char **argv)
{
        if (argc != 2 || strstr(argv[1], "/sd") != NULL)
                return usage(argv[0]);

        test(argv[1]);
        return 0;
}

Shouldn't that != NULL be a == NULL instead?