Seagate / openSeaChest

Cross platform utilities useful for performing various operations on SATA, SAS, NVMe, and USB storage devices.
Other
471 stars 61 forks source link

Firmware update results in "Unaligned Writes"... #152

Open reb00tz opened 3 weeks ago

reb00tz commented 3 weeks ago

Attempting to update firmware on Seagate Exos X18 16TB (ST16000NM000J-2TW103) fails.

Using firmware file results in "unaligned writes" error.

Tried different variations of commands used. Used both RAID and Non-RAID versions.

Attached are the -v3 logs and output, including system and drive information *_system_drive_info.txt along with the two RAID/Non-RAID scripts used to generate them.

More platform information as follows:

run_fwupgrade_Non-RAID.tar.gz run_fwupgrade_RAID.tar.gz

firmware_unaligned_write.tar.gz

vonericsen commented 3 weeks ago

@reb00tz,

Thank you for reporting this. One of the issues I have seen occasionally in Linux is that rather than pass back the real drive status about what went wrong, it occasionally mistranslates to "Unaligned write command" which does not really have a meaning in relationship to a firmware update. It should be returning "ATA Passthrough Information Available" status instead which would tell me what went wrong in more detail. I have attempted to look into the kernel code and find this bug, but I have not yet located where this happens.

Most likely the drive is rejecting the command with an error of some kind, but the code translating to sense data sees a bit that is referred to as "alignment error" set, however that bit only has meaning on SATA drives for write commands (Write DMA, Write DMA ext, Write FPDMA, etc) and not others like the Download Microcode command that is being issued.

One thing you can try is using the --downloadMode full option and that may give more information back about what is wrong since it will attempt to send the update in one large transfer to the drive. This mode is not usually preferred due to how large modern firmwares can get which may be bigger than the OS allows (Windows only allows up to 64K transfers per command, but Linux does not have this restriction typically).

reb00tz commented 3 weeks ago

Hi!

Noted on that point. Related to the issue I raised using the CFS file, this issue may be a red herring (although it would have been better to have a correct error message).

Nevertheless, I have attached the logs using (i) the SeaChest utility that came packaged with the firmware files, and (ii) extended the test runs to include --downloadMode full, with and without the various --forceATA options.

packaged_firmwarefile.zip

vonericsen commented 3 weeks ago

Thank you for that additional set of logs.

What is interesting is that in these new logs, the status is coming back up as

Sense Key: 5h = Illegal Request ASC & ASCQ: 26h - 99h = Vendor specific ascq code FRU: 0h = No Additional Information

I saw this when I checked the full download attempt, then went back into the other files and I noticed this showed up in the file packaged_firmwarefile_fail.txt as well before the final status where the unaligned write status is also showing up.

While we cannot know the exact meaning of this status, I can guess it's trying to say something like "The drive failed this download so there was something wrong in the data sent to the drive that it did not like" since 5/26/00 means "Invalid field in parameter list" and sometimes controllers/drivers try to set their own unique codes....however I have not been able to find anything in the kernel's AHCI driver that sets this unique code.

What is also interesting is that in this file, it used the default deferred download and this status happened when transferring the final data segment of the file to the drive and then when it attempted to send an activate firmware command that is where the "unaligned write" showed up.

What this tells me is that the drive likely received all the firmware code, did its validation of the data it received, then reported some error back because something did not look right/compatible to the current firmware on the drive.

I found one place where an ASC of 26h is set, but the ASCQ is set to zero. I have not figured out where this is setting ASCQ to 99h.

Reviewing some of the libATA code, I suspect the "unaligned write" is coming from here I'll see if I can figure out if this if condition makes sense here or is reused or not. Based on my knowledge of SAT (SCSI to ATA translation) and the comments, this does not look like the correct thing to do for an ATA passthrough command, but maybe this code is reused elsewhere...I'll see if I can figure that out.

The TLDR: it seems like the drive received the entire firmware, didn't like something about it, and rejected it with an abort status. I think I may have gotten a little deeper to figure out the cause of unaligned write command, but need to figure out how that code gets used before submitting a patch to make sure it is correct.