Seagate / openSeaChest

Cross platform utilities useful for performing various operations on SATA, SAS, NVMe, and USB storage devices.
Other
479 stars 61 forks source link

tracking "unkown command" #119

Closed theedge456 closed 1 year ago

theedge456 commented 1 year ago

Hello, I don't know if this is the correct place to post this issue. If not, mzy someone direct me to the correct one.

I have a ST1000DM010 handled by a gigabyte motherboard a320ma-m.2. The OS is devuan chimera (based on debian bullseye), using kernel 6.1.37 from kernel.org. I'm tracking a problem displayed in the logs because the disk is rarely used. It is only used to perform compilation in multicore mode.

kernel: [  750.902215] sd 5:0:0:0: [sda] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=5s
 kernel: [  750.905013] sd 5:0:0:0: [sda] tag#2 Sense Key : Illegal Request [current] 
 kernel: [  750.907809] sd 5:0:0:0: [sda] tag#2 Add. Sense: Unaligned write command
 kernel: [  750.910572] sd 5:0:0:0: [sda] tag#2 CDB: Read(10) 28 00 12 c2 4c 80 00 00 38 00
 kernel: [  750.916106] sd 5:0:0:0: [sda] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=5s
 kernel: [  750.918889] sd 5:0:0:0: [sda] tag#22 Sense Key : Illegal Request [current] 
 kernel: [  750.921675] sd 5:0:0:0: [sda] tag#22 Add. Sense: Unaligned write command
 kernel: [  750.924462] sd 5:0:0:0: [sda] tag#22 CDB: Read(10) 28 00 17 14 8d 00 00 00 88 00
 kernel: [  750.930065] ata6: EH complete

smartctl shows:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   063   006    Pre-fail  Always       -       811937
  3 Spin_Up_Time            0x0003   099   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       -       1397
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   077   060   045    Pre-fail  Always       -       56586838
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       599
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   020    Old_age   Always       -       1399
183 Runtime_Bad_Block       0x0032   086   086   000    Old_age   Always       -       14
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   057   040    Old_age   Always       -       34 (Min/Max 34/34)
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1398
194 Temperature_Celsius     0x0022   034   007   000    Old_age   Always       -       34 (0 7 0 0 0)
195 Hardware_ECC_Recovered  0x001a   100   001   000    Old_age   Always       -       811937
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       589h+00m+00.656s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       21343524455
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       10832830779

I compiled the last version of openSeaChest (2.2.1-6_0_1 X86_64).

I started a long test with this command: sudo ./openSeaChest_GenericTests -d /dev/sg1 --longGeneric

The result showed no errors. I only saw messages about "unknown command". Is there a way to display these messages ?

I saw on the seagate website that there was an updated firmware for ST1000DM series (ST1000DM004 or ST1000DM007) but not the ST1000DM010, still driven by the CC43 firmware. May I try to update the firmware ? Any other hint ?

- Fabien

theedge456 commented 1 year ago

Forget this message. Everything went back to normal when I connected the disk to other data and power supply ports on the motherboard.

vonericsen commented 1 year ago

Hi @theedge456,

Just to confirm, the drive is working properly with new cabling and this message no longer shows in the system logs?

theedge456 commented 1 year ago

Yes. Lucky me

Fabien

vonericsen commented 1 year ago

Thanks for confirming that! I have seen cabling issues before, but not something that shows up like this. A lot of times the attribute 199 will start to increase when there is a problem and there are other symptoms but generally not "unaligned write command", so that is really weird. Cabling issues always have odd symptoms, but this is not one I've seen before.

If it happens again, please update this issue or create a new one and we can see if we can dig a little deeper to find out more information behind the cause, but sometimes it is as simple as replacing a flaky cable. I'm going to mark this closed for now, but please reopen it if you need to.

theedge456 commented 1 year ago

I generated this file at that time. Tell me if it helps.

HD_statistics.zip

vonericsen commented 1 year ago

@theedge456, I took a look at the file and didn't see anything that would be an indication of a cabling issue like I would expect.

If you run into this again, capturing the device statistics and SMART attributes will likely be most helpful. openSeaChest_SMART -d <handle> --smartAttributes analyzed --deviceStatistics > debug.txt

Another thing that might help is dumping the SATA phy event counters. This is not part of openSeaChest yet, but it is present in smartctl. It is possible whatever is happening was logged there was well.

theedge456 commented 1 year ago

@vonericsen, This is the result of sg_sat_phy_event --ck_cond --verbose 1>stdout.log 2>stderr.log

Tell me if it helps ata_phy_event.zip