Seagate / openSeaChest

Cross platform utilities useful for performing various operations on SATA, SAS, NVMe, and USB storage devices.
Other
471 stars 61 forks source link

Exos X16 doesn't offer setSectorSize #47

Closed danderson closed 1 year ago

danderson commented 3 years ago

Thanks for the OSS tooling to manipulate the advanced features of Seagate drives! I really appreciate it.

I have a bunch of new Exos X16 drives, model ST16000NM001G-2KK103 with firmware revision SN03. According to the datasheet for the drive, it's a 512E drive that should support FastFormat to switch to 4k blocks.

However with openSeaChest compiled at HEAD on linux, neither "fast format" nor "set sector size" are listed in the supported features:

$ ./openSeaChest_Configure -d /dev/sda -i
==========================================================================================
 openSeaChest_Configure - openSeaChest drive utilities - NVMe Enabled
 Copyright (c) 2014-2021 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
 openSeaChest_Configure Version: 2.0.0-2_1_0 X86_64
 Build Date: Jan 27 2021
 Today: Wed Jan 27 22:15:16 2021    User: root
==========================================================================================

/dev/sda - ST16000NM001G-2KK103 - ZL27EN69 - ATA
    Model Number: ST16000NM001G-2KK103
    Serial Number: ZL27EN69
    Firmware Revision: SN03
    World Wide Name: 5000C500C7556E6E
    Drive Capacity (TB/TiB): 16.00/14.55
    Temperature Data:
        Current Temperature (C): 31
        Highest Temperature (C): 37
        Lowest Temperature (C): 28
    Power On Time:  12 days 10 hours 
    Power On Hours: 298.00
    MaxLBA: 31251759103
    Native MaxLBA: Not Reported
    Logical Sector Size (B): 512
    Physical Sector Size (B): 4096
    Sector Alignment: 0
    Rotation Rate (RPM): 7200
    Form Factor: 3.5"
    Last DST information:
        DST has never been run
    Long Drive Self Test Time:  23 hours 49 minutes 
    Interface speed:
        Max Speed (Gb/s): 6.0
        Negotiated Speed (Gb/s): 6.0
    Annualized Workload Rate (TB/yr): 10.83
    Total Bytes Read (GB): 190.47
    Total Bytes Written (GB): 178.10
    Encryption Support: Not Supported
    Cache Size (MiB): Not Reported
    Read Look-Ahead: Enabled
    Write Cache: Enabled
    Low Current Spinup: Disabled
    SMART Status: Unknown or Not Supported
    ATA Security Information: Supported
    Firmware Download Support: Full, Segmented
    Specifications Supported:
        ACS-4
        ACS-3
        ACS-2
        ATA8-ACS
        ATA/ATAPI-7
        ATA/ATAPI-6
        ATA/ATAPI-5
        SATA 3.2
        SATA 3.1
        SATA 3.0
        SATA 2.6
        SATA 2.5
        SATA II: Extensions
        SATA 1.0a
        ATA8-AST
    Features Supported:
        Sanitize
        SATA NCQ
        SATA Software Settings Preservation [Enabled]
        SATA Device Initiated Power Management
        Power Management
        Security
        SMART [Enabled]
        48bit Address
        PUIS
        GPL
        Streaming
        SMART Self-Test
        SMART Error Logging
        Write-Read-Verify
        DSN
        AMAC
        EPC [Enabled]
        Sense Data Reporting [Enabled]
        SCT Write Same
        SCT Error Recovery Control
        SCT Feature Control
        SCT Data Tables
        Host Logging
        Seagate In Drive Diagnostics (IDD)
    Adapter Information:
        Vendor ID: 1AF4h
        Product ID: 0008h
        Revision: Not available.

As expected, if I try to format anyway, I just get errors that the drive doesn't support those commands. As a result, AFAICT I'm unable to switch to 4k sectors.

Is this something I'm doing wrong, or is there something missing in openSeaChest to support these drives?

danderson commented 3 years ago

In the spirit of curiosity, I also tried the non-OSS SeaChest binary from https://github.com/Seagate/ToolBin, and it reports the same information, and also refuses to change the sector size.

vonericsen commented 3 years ago

Hi @danderson,

This is very strange. The code to do the sector size conversion is in the tool, so there shouldn't be anything missing in software. The bit that indicates support of this feature exists in the identify device data log and I'm wondering if there was some error reading that which can happen in weird scenarios with certain controllers.

I noticed that the adapter information says that this drive is attached to some kind of Redhat Virtio device, but the product ID doesn't match anything I can find online...so it's possible that this is causing some kind of incompatibility. I have some suspicions since the SMART status is also reporting as "unknown or not supported" which is an indicator that the return task file registers don't come back properly or at all, but I would not expect this to affect reading this log page.

The verbose output from the tool should help identify if that is the case or not. It will be very verbose outputting command results and data, which I will need to review to see if there is possibly some other issue going on. Can you attach the verbose output of the -i output? ./openSeaChest_Configure -d /dev/sda -i -v 4 | tee verboseIdentify.txt

danderson commented 3 years ago

Thanks for the detailed response!

The drives are attached to a https://zfs.rent VM using qemu's LUN passthrough feature, which theoretically is supposed to let the VM send commands directly to the drives. I don't have full details on the underlying hardware, but what I do know is that the chain of custody is roughly openSeaChest -> virtio_scsi -> JMicron SATA card -> drive. If that looks to be the problem, I can get more details and diagnostics from the provider.

Here is the verbose identity output: verboseIdentify.txt

vonericsen commented 3 years ago

Thanks for the information @danderson!

When I reviewed the log, it seems that this passthrough feature is supporting the A1h SAT passthrough command, which is why some drive information is retrieved in the -i output that looks like it matches the drive you are expecting to see.

Whenever the 85h SAT passthrough command is issued, an error is returned. The difference between them is the A1h opcode only allows 28bit command (so you can get basic identify data and some SMART data), whereas the 85h command allows 48bit commands, which are what is required to get newer drive information from GPL (General purpose logging) logs where additional information, such as support to change the sector size is reported and the command to change the sector size is a 48bit command as well, so this opcode is required in order to make this feature work. It seems that these 85h commands are blocked or aborted no matter the command that is issued as the non-data command to read this drive's accessible max (or native max) address is blocked, so it isn't limited to the ability to read these other logs as far as I can tell.

It is possible that the virtio_scsi implementation is not really setup to support SAT passthrough beyond the basics inside the A1h command, but does support translating other SCSI type requests instead as it seemed that many of those were translated without a problem (only say one or two that were aborted with the same sense error code). The problem with this is, there is not currently an implemented translation for switching the sector size by using SCSI commands in libata and the SAT5 specification which defines these translations is still very new and I don't think it has been finalized either. SAT5 currently lists this translation as "may be supported" which essentially translates to "optional" so it may not even be implemented by libata or other translators (like USB bridges or SAS HBAs).

I do not think there is a way for openSeaChest to get these commands through, but if you find any other information, I would be happy to dive deeper and try some additional changes. Right now the only thing I could do is add a "rule" or "hack" that says this HBA (the virtio_scsi that is reported) only allows 28bit commands, but that doesn't do much other than help the tool understand that it is running in a limited mode. If I add this, I can look into a way to report in the -i this and other known limitations (if any) as we don't have a method to report those known limitations to the user at this time.

danderson commented 3 years ago

Thanks for the diagnostic! It does indeed look like qemu is preventing 85h commands from reaching the drive. My server host set up a testbench, and captured the following dumps from within a VM, and on the bare host: vm_v4.txt host_v4.txt

Diffing the two, you'll see that on the host, 48-bit commands work fine, and the drive reports additional information and capabilities. And indeed, running on the bare host I was able to switch to a 4k sector size with no issues.

I'm going to try and dig into qemu and see if there's an obvious place where LUN passthrough could be enhanced, but it sounds like the best openSeaChest could do would be to detect this failure case and print a warning about it.

Thanks!

vonericsen commented 3 years ago

@danderson, thanks for testing some more and letting me know! Glad you were able to do it from the host!

I will look into what I can add to inform about these kinds of limitations.

Would you mind testing one more thing for me to make sure we are understanding the limitations of the virtio scsi hba properly? In order to make sure the filter is on the 85h versus what is being encapsulated in it, I want to see if issuing an ATA identify using the 85h command completes the same or not. For this, you'll need to use sg_raw from the sg3utils package. Unless your user is in the disk group, you'll need to run these as sudo or root like openSeaChest also requires.

First, make sure the A1h goes through: sg_raw -r 512 /dev/YourHandleHere A1 08 0E 00 01 00 00 00 A0 EC 00 00 2>&1 | tee sgRawA1.txt

Now try again with the 85h: sg_raw -r 512 /dev/YourHandleHere 85 08 0E 00 00 00 01 00 00 00 00 00 00 A0 EC 00 2>&1 | tee sgRaw85.txt

Those both send the ATA identify command, one just uses a larger CDB, so if they both work, then that means the LUN passthrough is filtering all but certain encapsulated commands like identify and SMART. If the 85h fails here too, then it's filtering on the opcode, which is useful to know when setting up some of the known limitations in the tool.

Also, I don't have a lot of experience setting up qemu, let alone this LUN passthrough functionality. If you know of a guide or instructions to configure this, I would be happy to review them so I can do some additional testing without bugging you any further 😃 If you don't know of one, I will poke around until I figure out how to do it. Thanks!

danderson commented 3 years ago

I ran both identify commands you provided. The 85h version fails, the A1h version works. So, it's looking like something in the I/O chain is only passing specific known opcodes, and just doesn't know about 85h. Outputs: sgRawA1.txt sgRaw85.txt

Weirdly, qemu's LUN passthrough feature is very poorly documented, I can only find references to it in Redhat presentations about the development of virtio_scsi, and some forum posts of users trying to figure out how to enable it. So, I don't have a good recipe for you to set up that environment. The closest I could find to documentation is https://wiki.qemu.org/images/c/c2/Virtio-scsi.pdf , which explains how to enable LUN passthrough in libvirtd configurations.

I'm happy to run commands for you, or even give you access to this VM once it's back in the datacenter next week. I'll continue exploring qemu to see if I can find the code that handles LUN passthrough. I'm hoping there's a really simple if (opcode != 0xA1) that I can fix :).

I know almost nothing about ATA, so maybe you could help me out as well: is there any risk in allowing all 85h commands to go to a LUN? Or would qemu need to do additional filtering to only allow "safe" commands to leave the VM? My default assumption is that as long as you're only talking to the LUN that you're allowed to, the contents of the requests shouldn't matter. Does that sound right?

vonericsen commented 3 years ago

Hi @danderson, Thanks for testing those additional commands...it seems the 85h opcode is being filtered, not the contents of the command it is trying to issue.

Based on that document you linked, it sounds like it should be possible to issue whatever commands the guest wants to the device, but that is apparently not the case. The design probably only really focused on some basic capabilities for drive information and reading and writing to optimize compatibility without providing full functionality.

is there any risk in allowing all 85h commands to go to a LUN? Or would qemu need to do additional filtering to only allow "safe" commands to leave the VM?

I had to think about this over the weekend....I think the answer depends on what is considered a risk. It is possible that certain things are filtered to keep the way the host and guest understand and access the drive compatible with each other. It could be because there is concern that the guest could issue a command like sanitize or format which could erase the whole device, or change a configuration setting on the drive that causes some compatibility issue. This is all speculation at this point, but that would be some things that could be considered a risk. A fast format or sector size change would fall into the later of changing something about the drive that causes incompatibility between host and guest with the device (if that is a real problem or concern).

Because this is a LUN passthough instead of device passthrough, in the SCSI world the concern would be larger than in SATA. In SATA, a physical device only has one logical unit (LUN), so this keeps it simple. In SCSI/SAS, a device can have multiple logical units on it and can be accessed with multiple ports. Many SAS drives have 2 ports, although most have 1 LUN. Each lun on each port gets a device handle in the OS (at least every OS I've played with so far), so a dual port, single lun shows up with 2 handles. It stands to reason that additional ports or LUNs would also get their own handle as well. SATA drives are single port and single LUN due to restrictions on SATA specifications that don't allow more than this, so they are easy that they will only have a single instance show up in the system. Multi port SAS will only show each port if they are all connected...standard off the shelf cabling will only pick up a single port. I've only seen dual ports exposed with special backplanes that connect to each port and not everyone uses this feature.

This is not the case with some implementations of multi-actuator products. Depending on the configuration, a single physical drive may show multiple logical units, one for each actuator. The risk in LUN passthrough is that on a device like this, reads and writes will only affect one LUN as expected, but other commands that can change caching, error recovery, or do something like erase the drive or change the sector size may affect ALL logical units. So if the first LUN is passed to guest 1, and the second LUN is passed to guest 2, and guest 2 decides to reformat the drive, it is possible that all LUNs will get changed and erased depending on what the device's firmware supports. This can destroy guest 1 if these changes were made, which would not be good in this use-case. There are new fields to help describe when this will happen in the T10 specifications, but as far as I know, current implementations affect the whole physical device. So the risk can be much greater when this kind of change is made in configurations like this.

Since this is all done as a "SCSI" device to the host OS (design of linux in general with libata and not every OS attempts to do ATA passthrough to determine the "child" device since it may not really be useful for normal day to day operations), the host may not or may choose not to differentiate between SCSI and SATA like our tool does, so they may just filter all commands that they don't want potentially affecting multiple logical units.

danderson commented 1 year ago

I never revisited this issue, sorry for leaving things hanging :( I wasn't able to find any obvious reason why qemu would be preventing these commands from proceeding, and I worked around qemu's weirdness by getting my server host to run the sector reconfiguration from the bare metal, outside of my VM. So, I think this can be closed since "diagnose and fix qemu LUN passthrough" is definitely way outside the scope of this project.

Thank you for all your debugging efforts and insights!

vonericsen commented 1 year ago

Per @danderson I will close this issue. I did push a small change so that if this "controller" is detected the --llInfo will now dump that it is limited to 28bit ATA passthrough commands. This is far from perfect but can help debug it more if we run into this again in the future. We can add addition "hacks" or "workarounds" for this configuration down the road, but I'm not sure what else will really be needed at this point.