dankamongmen / growlight

notcurses block device manager / system installation tool
https://nick-black.com/dankwiki/index.php/Growlight
GNU General Public License v3.0
85 stars 12 forks source link

device with damaged partition table isn't showing up at all #79

Closed dankamongmen closed 3 years ago

dankamongmen commented 4 years ago

I've got /dev/sdk:

[schwarzgerat](1) $ lsblk /dev/sdk
NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sdk    8:160  0 111.8G  0 disk
[schwarzgerat](0) $ 

gpart has some real problems with it, stalling out over many seconds:

[schwarzgerat](1) $ sudo gpart /dev/sdk

Begin scan...
Possible partition(Linux swap), size(244mb), offset(138mb)
Possible partition(Linux LVM2 physical volume), size(18854mb), offset(795mb)
...stall....

growlight-readline does not list it in blockdev output, and loops on the following diagnostic:

[growlight](0)> blockdev
Device     Model             Rev   Bytes PSect Flags Table WWN              PHY
sdh        ST12000NM0007-2A SN02  12.00T 4096B ✔OW⚠. gpt   5000c500b49867e5 SAT3
sdi        ST12000NM0007-2A SN03  12.00T 4096B ✔OW⚠. gpt   5000c500a5c0e61d SAT3
sdj        Hitachi HTS72323 A60W 320.07G  512B ✔OW.B dos   5000cca61dcdfd9d SAT2
sde        ST12000NM0007-2A SN02  12.00T 4096B ✔OW⚠. gpt   5000c500b56a29d4 SAT3
sda        ST12000NM0007-2A SN02  12.00T 4096B ✔OW⚠. gpt   5000c500b4984eca SAT3
sdb        ST12000NM0007-2A SN02  12.00T 4096B ✔OW⚠. gpt   5000c500b4104bf5 SAT3
sdf        ST12000NM0007-2A SN02  12.00T 4096B ✔OW⚠. gpt   5000c500b5685ea4 SAT3
sdg        ST12000NM0007-2A SN02  12.00T 4096B ✔OW⚠. gpt   5000c500b1c2c393 SAT3
sdc        ST12000NM0007-2A SN02  12.00T 4096B ✔OW⚠. gpt   5000c500b56936d2 SAT3
sdd        ST12000NM0007-2A SN02  12.00T 4096B ✔OW⚠. gpt   5000c500b3f4afb4 SAT3
sr0        iHBS112   2      CL0F   1.07G  512B UO... none  n/a              PATA
nvme0n1    WDS100T3X0C-00SJ  n/a   1.00T  512B ✔.... gpt   1908E1805012     NVMe
nvme1n1    WDS100T3X0C-00SJ  n/a   1.00T  512B ✔.... gpt   1908E1801188     NVMe
nvme2n1    INTEL MEMPEK1W01  n/a  14.40G  512B ✔.... none  PHBT729201SR016D NVMe
md127      Linux mdadm       1.2 106.23G  512B VM... none  root             NVMe
pktcdvd0   n/a               n/a    0.00 2048B RO... none  n/a              n/a
zhomez     LLNL ZoL         5000 884.76M 4096B VZ... spa   n/a              ?
chungus    LLNL ZoL         5000 107.99M  512B VZ... spa   n/a              ?
sdl        STORAGE DEVICE   1203    0.00  512B RO... none  n/a              PATA
sdm        STORAGE DEVICE   1203    0.00  512B RO... none  n/a              PATA
sdo        STORAGE DEVICE   1203    0.00  512B RO... none  n/a              PATA
sdp        STORAGE DEVICE   1203    0.00  512B RO... none  n/a              PATA
sdn        STORAGE DEVICE   1203    0.00  512B RO... none  n/a              PATA

    Flags:  (R)emovable, (U)nloaded, (V)irtual, (M)dadm, (Z)pool,
        (D)M, r(O)tational, (r)ead-only, (W)ritecache enabled,
        (B)IOS bootable, v/⚠: Read-Write-Verify, ✓/✗/☠: SMART status
[growlight](0)> Couldn't probe partition table of sdk (Success)
Got stats for unknown device [sdk]
Couldn't probe partition table of sdk (Success)
Got stats for unknown device [sdk]
Couldn't probe partition table of sdk (Success)
Got stats for unknown device [sdk]
Couldn't probe partition table of sdk (Success)
Got stats for unknown device [sdk]
Couldn't probe partition table of sdk (Success)

it doesn't show up at all in growlight. hdparm shows:

[schwarzgerat](127) $ sudo hdparm -i /dev/sdk

/dev/sdk:

 Model=INTEL SSDSA2CW120G3, FwRev=4PC10362, SerialNo=PEPR332400UX120LGN
 Config={ Fixed }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=1
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234441648
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-2,3,4,5,6,7

 * signifies the current active mode

[schwarzgerat](0) $
dankamongmen commented 3 years ago

OK, I was able to reproduce this. It has something to do with how we're creating (at least) GPT partition tables.

I have a Corsair USB device, /dev/sdc:

[grimes](0) $ lsblk /dev/sdc
NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sdc    8:32   1 230.6G  0 disk 
[grimes](0) $ 

The entirety of related dmesg output is as follows:

[340809.825600] usb 2-2: new SuperSpeed Gen 1 USB device number 3 using xhci_hcd
[340809.843507] usb 2-2: New USB device found, idVendor=1b1c, idProduct=1a0a, bcdDevice= 1.10
[340809.843513] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[340809.843517] usb 2-2: Product: Survivor 3.0
[340809.843520] usb 2-2: Manufacturer: Corsair
[340809.843523] usb 2-2: SerialNumber: 07089880037D2799
[340809.845566] usb-storage 2-2:1.0: USB Mass Storage device detected
[340809.846198] scsi host4: usb-storage 2-2:1.0
[340810.866617] scsi 4:0:0:0: Direct-Access     Corsair  Survivor 3.0     000A PQ: 0 ANSI: 6
[340810.867672] sd 4:0:0:0: Attached scsi generic sg2 type 0
[340810.868252] sd 4:0:0:0: [sdc] 483655680 512-byte logical blocks: (248 GB/231 GiB)
[340810.868584] sd 4:0:0:0: [sdc] Write Protect is off
[340810.868593] sd 4:0:0:0: [sdc] Mode Sense: 45 00 00 00
[340810.869006] sd 4:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[340810.988389]  sdc: sdc1
[340810.991306] sd 4:0:0:0: [sdc] Attached SCSI removable disk

same behavior with a lengthy stall on gpart.

This is blkid_partlist_get_table returning NULL.

dankamongmen commented 3 years ago

So blkid identifies this as MBR, even though we supposedly just prepared a GPT partition table on it:

[grimes](0) $ sudo blkid  /dev/sdc
/dev/sdc: PTTYPE="PMBR"
[grimes](0) $ 

I pulled the drive and put it back in, to ensure this isn't just a failure to update kernel data structures. Nope! Still PMBR; kernel sees no partitions, just /dev/sdc.

dankamongmen commented 3 years ago

and yet fdisk shows us with a GPT:

[grimes](130) $ sudo fdisk /dev/sdc

Welcome to fdisk (util-linux 2.36.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): p
Disk /dev/sdc: 230.63 GiB, 247631708160 bytes, 483655680 sectors
Disk model: Survivor 3.0    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot Start        End    Sectors Size Id Type
/dev/sdc1  *        1 4294967295 4294967295   2T ee GPT

Command (m for help): 

Hrmmm, we see GPT, but check out that "Disklabel: dos". that doesn't seem right!

dankamongmen commented 3 years ago

wait, is that suggesting that i have a GPT ... something inside of an MSDOS partition?

dankamongmen commented 3 years ago

gparted says:

Both the primary and backup GPT tables are corrupt. Try making a fresh table, and using Parted's rescue feature to recover partitions. grimes $ sudo gparted /dev/sdc Unit \xe2\x97\x8f.service does not exist, proceeding anyway. GParted 1.1.0 configuration --enable-libparted-dmraid --enable-online-resize libparted 3.3 Both the primary and backup GPT tables are corrupt. Try making a fresh table, and using Parted's rescue feature to recover partitions.

[grimes](0) $ sudo gparted /dev/sdc
Unit \xe2\x97\x8f.service does not exist, proceeding anyway.
GParted 1.1.0
configuration --enable-libparted-dmraid --enable-online-resize
libparted 3.3
Both the primary and backup GPT tables are corrupt.  Try making a fresh table, and using Parted's rescue feature to recover partitions.
dankamongmen commented 3 years ago

So there are two bugs here: we're generating corrupt GPT tables (this might be due to the zlib change for crc), and we're not displaying devices without partition tables, let alone that they have damaged partition tables (like gparted can).

I'll make another bug for the corrupt GPT tables.

dankamongmen commented 3 years ago

ok that EE business is just the protective MBR; i'd forgotten about that

dankamongmen commented 3 years ago

With a correctly-formed header, we display the device fine (see #117). So yeah, we just need to handle the damaged header case.

dankamongmen commented 3 years ago

Got it.