Drive-Trust-Alliance / sedutil

DTA sedutil Self encrypting drive software
611 stars 236 forks source link

Locked SED drives and linux errors #454

Open adambmedent opened 9 months ago

adambmedent commented 9 months ago

Hey all I am using SED's in a server enviroment for encryption. I have a process in place to unlock the drives once the server boots up, however the locked drives seem to cause a number of issues on the linux server during boot.

Is there any easy or known way to ignore locked SED's during boot? Anyone else every run into this issue? Once the drives are unlocked all is well.

[Wed Dec 13 06:49:50 2023] sd 8:0:1:0: [sday] tag#1059 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s [Wed Dec 13 06:49:50 2023] sd 8:0:1:0: [sday] tag#1059 Sense Key : Illegal Request [current] [Wed Dec 13 06:49:50 2023] sd 8:0:1:0: [sday] tag#1059 Add. Sense: Security conflict in translated device [Wed Dec 13 06:49:50 2023] sd 8:0:1:0: [sday] tag#1059 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 [Wed Dec 13 06:49:50 2023] I/O error, dev sdbo, sector 15002931712 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [Wed Dec 13 06:49:50 2023] Buffer I/O error on dev sdbo, logical block 1875366464, async page read [Wed Dec 13 06:49:51 2023] Buffer I/O error on dev sdbo, logical block 1875366464, async page read

adambmedent commented 9 months ago

Well looks like I got into a worse situation testing.

Seems the one drive I was testing with is bricked in some way.

Ran the following, powered down the box and power it back up. Once it came back online I can no longer do anything with the drive.
sedutil-cli --initialsetup PASSWORD /dev/sddq sedutil-cli --enablelockingrange 0 PASSWORD /dev/sddq sedutil-cli --setlockingrange 0 rw PASSWORD /dev/sddq sedutil-cli --setmbrenable off PASSWORD /dev/sddq

root@BunkSnapVaultProx:/dev/disk/by-id# ls -ltrh | grep 4441 lrwxrwxrwx 1 root root 10 Dec 13 08:35 ata-SAMSUNG_MZ7L37T6HBLA-00A07_S6EPNN0W504441 -> ../../sddq

root@BunkSnapVaultProx:~# sedutil-cli --query /dev/sddq Invalid or unsupported disk /dev/sddq

adambmedent commented 9 months ago

Bummer, seems sedutils bricked my drive in some way. Can't even see information via hdparm anymore.

root@BunkSnapVaultProx:/dev/disk/by-id# hdparm -I /dev/disk/by-id/ata-SAMSUNG_MZ7L37T6HBLA-00A07_S6EPNN0W504441

/dev/disk/by-id/ata-SAMSUNG_MZ7L37T6HBLA-00A07_S6EPNN0W504441: SG_IO: bad/missing sense data, sb[]: 70 00 0b 00 00 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ATA device, with non-removable media Standards: Likely used: 1 Configuration: Logical max current cylinders 0 0 heads 0 0 sectors/track 0 0

    Logical/Physical Sector size:           512 bytes
    device size with M = 1024*1024:           0 MBytes
    device size with M = 1000*1000:           0 MBytes
    cache/buffer size  = unknown

Capabilities: IORDY not likely Cannot perform double-word IO R/W multiple sector transfer: not supported DMA: not supported PIO: pio0

adambmedent commented 9 months ago

Once in awhile ill see this.

root@BunkSnapVaultProx:~# sedutil-cli --query /dev/sdbu Properties exchange failed

/dev/sdbu ATA SAMSUNG MZ7L37T6HBLA-00A07 JXTC304Q S6EPNN0W504441 TPer function (0x0001) ACKNAK = N, ASYNC = N. BufferManagement = N, comIDManagement = N, Streaming = Y, SYNC = Y Locking function (0x0002) Locked = Y, LockingEnabled = Y, LockingSupported = Y, MBRDone = N, MBREnabled = N, MBRAbsent = N, MediaEncrypt = Y Geometry function (0x0003) Align = Y, Alignment Granularity = 8 (4096), Logical Block size = 512, Lowest Aligned LBA = 0 DataStore function (0x0202) Max Tables = 9, Max Size Tables = 10485760, Table size alignment = 1 OPAL 2.0 function (0x0203) Base comID = 0x1004, Initial PIN = 0x00, Reverted PIN = 0x00, comIDs = 1 Locking Admins = 4, Locking Users = 9, Range Crossing = N

So I know the drive i locked.

Most of the time I see this.

root@BunkSnapVaultProx:~# sedutil-cli --query /dev/sdbu Invalid or unsupported disk /dev/sdbu

I have to be missing something major here.

Similar with this command as well.

root@BunkSnapVaultProx:~# sedutil-cli --setlockingrange 0 rw MyPass /dev/sdbu Invalid or unsupported disk /dev/sdbu root@BunkSnapVaultProx:~# sedutil-cli --setlockingrange 0 rw t MyPass /dev/sdbu Invalid or unsupported disk /dev/sdbu root@BunkSnapVaultProx:~# sedutil-cli --setlockingrange 0 rw MyPass /dev/sdbu Invalid or unsupported disk /dev/sdbu root@BunkSnapVaultProx:~# sedutil-cli --setlockingrange 0 rw MyPass /dev/sdbu Properties exchange failed unsigned int requested for token is unsupported

adambmedent commented 9 months ago

Testing with another drive, hit the same issue once the drive is locked. This time I manually locked the drive.

root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --setlockingrange 0 ro PASSWORD /dev/sdcn LockingRange0 set to RO

root@BunkSnapVaultProx:/dev/disk/by-id# mount /dev/sdcn /mnt ^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C

This looks good, drive shouldn't mount when its RO.

root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --setlockingrange 0 rw PASSWORD /dev/sdcn LockingRange0 set to RW

root@BunkSnapVaultProx:/dev/disk/by-id# mount /dev/sdcn /mnt root@BunkSnapVaultProx:/dev/disk/by-id# df -h Filesystem Size Used Avail Use% Mounted on udev 189G 0 189G 0% /dev tmpfs 38G 3.1M 38G 1% /run /dev/mapper/pve-root 94G 26G 64G 29% / tmpfs 189G 46M 189G 1% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock efivarfs 192K 53K 135K 29% /sys/firmware/efi/efivars /dev/sda2 511M 304K 511M 1% /boot/efi /dev/fuse 128M 20K 128M 1% /etc/pve tmpfs 38G 0 38G 0% /run/user/0 /dev/sdcn 7.0T 28K 6.6T 1% /mnt

Then I locked the drive and I am in the same position as the other drive.

root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --setlockingrange 0 lk PASSWORD /dev/sdcn LockingRange0 set to LK

root@BunkSnapVaultProx:~# sedutil-cli --setlockingrange 0 rw PASSWORD /dev/sdcn Properties exchange failed unsigned int requested for token is unsupported

I am guessing my only option is the PSID?

Blacklands commented 9 months ago

I think this is the second or third issue I've seen here now where someone on Linux had issues with booting because of encrypted non-boot drives. You could read #449, I don't know if that will help you, sorry.

For the record, I think Windows might have problems with this too, I think the OSes just might not like seeing drives on boot they cannot access?

A PSID revert should at least bring the drive back (hopefully). You could also try revertnoerase (see /wiki/Command-Syntax) to just disable locking.

adambmedent commented 9 months ago

I think this is the second or third issue I've seen here now where someone on Linux had issues with booting because of encrypted non-boot drives. You could read #449, I don't know if that will help you, sorry.

A PSID revert should at least bring the drive back (hopefully). You could also try revertnoerase (see /wiki/Command-Syntax) to just disable locking.

Appreciate the input. I did try the revertNoErase. That works aok as long as I don't power down the drive or put it into a lk state.

root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --revertNoErase PASSWORD /dev/sdcn Invalid or unsupported disk /dev/sdcn

These drives are remote, so I have to make a 40 minute trip to the data center to get the psid.

I had these working pretty well with hdparm, im not sure what in the world im missing with sedutils.

Blacklands commented 9 months ago

These drives are remote, so I have to make a 40 minute trip to the data center to get the psid.

There's also reverttper which uses your drive password I think (I've never tried it), it might have limitations compared to PSID revert, I don't know, or it works the same. But it might also not work if revertnoerase doesn't work?

Reading #449 again, maybe the solution for that (or a variation) could work here, too. basically having MBREnable on on boot (even though you don't need it because you're not booting from the drive) just so that the OS doesn't freak out about the drive because it can't access it. If that is the problem. shrug

adambmedent commented 9 months ago

These drives are remote, so I have to make a 40 minute trip to the data center to get the psid.

There's also reverttper which uses your drive password I think (I've never tried it), it might have limitations compared to PSID revert, I don't know, or it works the same. But it might also not work if revertnoerase doesn't work?

Reading #449 again, maybe the solution for that (or a variation) could work here, too. basically having MBREnable on on boot (even though you don't need it because you're not booting from the drive) just so that the OS doesn't freak out about the drive because it can't access it. If that is the problem. shrug

I gave that one a try with no luck as well.

root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --revertTPer PASSWORD /dev/sdbu Invalid or unsupported disk /dev/sdbu

Its almost as if once the disk is locked, its bricked. Im not even sold the psid is going to help here.

root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --yesIreallywanttoERASEALLmydatausingthePSID TEST /dev/sdbu Invalid or unsupported disk /dev/sdbu

Obviously the psid isn't test, but I would expect a different failure output, like the one below, on a disk that hasn't been locked yet.

root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --yesIreallywanttoERASEALLmydatausingthePSID TEST /dev/sdbd method status code NOT_AUTHORIZED Session start failed rc = 1 EndSession Failed

Blacklands commented 9 months ago

You might have to try taking the drive(s) out and doing this on another system. I hope they're not really bricked and it's just something going wrong with the OS? Do you have an external adapter that supports OPAL commands (sadly not all USB ones work, apparently those with a SATA controller do, and any Thunderbolt ones should work I think? I also have an NVME-to-USB one that works)? Or just a different computer to plug them into?

If they're actually bricked then this is of course a giant issue with sedutil (and sorry for your loss, oof). Wish we could put that into the README like really big on top or something but this repo is basically dead in terms of development and nobody else has control over it.

adambmedent commented 9 months ago

You might have to try taking the drive(s) out and doing this on another system. I hope they're not really bricked and it's just something going wrong with the OS? Do you have an external adapter that supports OPAL commands (sadly not all USB ones work, apparently those with a SATA controller do, and any Thunderbolt ones should work I think? I also have an NVME-to-USB one that works)? Or just a different computer to plug them into?

If they're actually bricked then this is of course a giant issue with sedutil (and sorry for your loss, oof). Wish we could put that into the README like really big on top or something but this repo is basically dead in terms of development and nobody else has control over it.

Is there a live CD/DVD OS I could use? I do have access to the ipmi port to mount remote iso's etc.

Blacklands commented 9 months ago

No idea, sorry. Can't you just try a Linux OS ISO, all the distros can boot as a live OS, right? And I don't know if Windows has anything like that these days. (Would be interesting to see if things are different on Windows, imo...) However, if the issue is connected to the hardware in the system then I would assume that a live OS wouldn't help. But if you can't take the drives out then I guess that's all you can try? :/

marcb1 commented 6 months ago

FWIW I ran into the same exact issue on the samsung drives MZ7L37T6HBLA As soon as the drives are locked, scsi and I/O errors show up in dmesg and the drives become unusable.

dmesg -w
mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
 sd 4:0:1:0: Power-on or device reset occurred

Seems like udev gets into a loop with the kernel's SCSI POWER_ON_RESET_OCCURRED events. udev tries to access locked device, drive is locked and generates a POWER_ON_RESET_OCCURRED event, which causes udev to try to access device again.

Can see events via:

sudo udevadm monitor -p
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

KERNEL[79934.766131] change   /devices/pci0000:80/0000:80:03.2/0000:87:00.0/host4/port-4:0/expander-4:0/port-4:0:1/end_device-4:0:1/target4:0:1/4:0:1:0 (scsi)
ACTION=change
DEVPATH=/devices/pci0000:80/0000:80:03.2/0000:87:00.0/host4/port-4:0/expander-4:0/port-4:0:1/end_device-4:0:1/target4:0:1/4:0:1:0
SUBSYSTEM=scsi
SDEV_UA=POWER_ON_RESET_OCCURRED
DEVTYPE=scsi_device
DRIVER=sd
MODALIAS=scsi:t-0x00
SEQNUM=12593

and can see debug systemd-udevdmessages: specifically Failed to run builtin 'blkid': Input/output error

$ sudo udevadm control --log-priority=debug

Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: /usr/lib/udev/rules.d/60-persistent-storage.rules:109 Failed to run builtin 'blkid': Input/output error                      
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: /usr/lib/udev/rules.d/60-persistent-storage.rules:119 LINK 'disk/by-id/wwn-0x5002538f02c309dc'                               
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Handling device node '/dev/sdb', devnum=b8:16                                                                                
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Preserve already existing symlink '/dev/block/8:16' to '../sdb'                                                              
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Found 'b8:16' claiming '/run/udev/links/\x2fdisk\x2fby-id\x2fscsi-SATA_SAMSUNG_MZ7L37T6_S6EPNA0TC03590'                      
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Preserve already existing symlink '/dev/disk/by-id/scsi-SATA_SAMSUNG_MZ7L37T6_S6EPNA0TC03590' to '../../sdb'                 
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Found 'b8:16' claiming '/run/udev/links/\x2fdisk\x2fby-path\x2fpci-0000:87:00.0-sas-exp0x500056b31054e8ff-phy1-lun-0'        
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Preserve already existing symlink '/dev/disk/by-path/pci-0000:87:00.0-sas-exp0x500056b31054e8ff-phy1-lun-0' to '../../sdb'   
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Found 'b8:16' claiming '/run/udev/links/\x2fdisk\x2fby-id\x2fscsi-35002538f02c309dc'                                         
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Preserve already existing symlink '/dev/disk/by-id/scsi-35002538f02c309dc' to '../../sdb'                                    
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Found 'b8:16' claiming '/run/udev/links/\x2fdisk\x2fby-id\x2fwwn-0x5002538f02c309dc'                                         
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Preserve already existing symlink '/dev/disk/by-id/wwn-0x5002538f02c309dc' to '../../sdb'                                    
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: sd-device: Created db file '/run/udev/data/b8:16' for '/devices/pci0000:80/0000:80:03.2/0000:87:00.0/host4/port-4:0/expander-
4:0/port-4:0:1/end_device-4:0:1/target4:0:1/4:0:1:0/block/sdb'                                                                                                                  
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Adding watch on '/dev/sdb'                                                                                                   
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: sd-device: Created db file '/run/udev/data/b8:16' for '/devices/pci0000:80/0000:80:03.2/0000:87:00.0/host4/port-4:0/expander-
4:0/port-4:0:1/end_device-4:0:1/target4:0:1/4:0:1:0/block/sdb'     

I couldn't figure out a proper way to fix. Ideally we'd want udev to ignore locked devices somehow, or perhaps this is a drive firmware issue. As I have seen other devices still respond to blkid command even when locked. These devices seem to generate SCSI POWER_ON_RESET_OCCURRED for any command when locked.

The workaround is to stop udev and unlock device:

$ sudo systemctl stop systemd-udevd systemd-udevd-kernel.socket systemd-udevd-control.socket