coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
146 stars 30 forks source link

Expose NVMe attached Google Compute Local SSDs the same as SCSI attached Local SSDs #2476

Open negz opened 6 years ago

negz commented 6 years ago

Issue Report

Issue https://github.com/coreos/bugs/issues/238 requested a udev rule be added such that NVMe attached Google Compute Engine local-ssd disks would appear under /dev/disk/by-id/google-{gcevolumename} to match how SCSI attached NVMe disks are loaded.

https://github.com/coreos/init/pull/215 was intended to fix that issue, but I believe it only works for SCSI attached local SSD disks. @lucab asked me to reraise this issue so I am doing so, but I did notice that this desired behaviour probably isn't actually possible given that according to udevadm info (see below) Google doesn't actually expose the volume name for NVMe disks as they do with SCSI disks.

Container Linux Version

# cat /etc/os-release 
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1745.7.0
VERSION_ID=1745.7.0
BUILD_ID=2018-06-14-0909
PRETTY_NAME="Container Linux by CoreOS 1745.7.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Environment

Google Compute Engine

Expected Behavior

NVMe attached GCE disks are symlinked at /dev/disk/by-id/google-{googlevolumedevicename} to match how SCSI attached GCE disks appear.

Actual Behavior

NVME attached disks are symlinked at /dev/disk/by-id/nvme-nvme_card_nvme_card.

Reproduction Steps

  1. Create a GCE instance with a disk of type local-ssd and interface NVME.
  2. Boot the GCE instance using CoreOS
  3. Inspect /dev/disk/by-id

Other Information

This is what udevadm says about the disk. Note that in this case the disk appears as /dev/disk/by-id/google-varlibdocker due to a fairly naive udev rule I added as a workaround:

# cat /etc/udev/rules.d/91-gce-nvme-varlibdocker.rules
# See https://github.com/coreos/bugs/issues/238
KERNEL=="nvme0n1", ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-id/google-varlibdocker"
# udevadm info /dev/nvme0n1
P: /devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n1
N: nvme0n1
S: disk/by-id/google-varlibdocker
S: disk/by-id/nvme-nvme.1ae0-6e766d655f63617264-6e766d655f63617264-00000001
S: disk/by-id/nvme-nvme_card_nvme_card
S: disk/by-path/pci-0000:00:04.0-nvme-1
S: disk/by-uuid/7dc97564-f38e-451d-9a67-e5e6b2ed9b40
E: DEVLINKS=/dev/disk/by-path/pci-0000:00:04.0-nvme-1 /dev/disk/by-id/nvme-nvme.1ae0-6e766d655f63617264-6e766d655f63617264-00000001 /dev/disk/by-uuid/7dc97564-f38e-451d-9a67-e5e6b2ed9b40 /dev/disk/by-id/google-varlibdocker /dev/disk/by-id/nvme-nvme_card_nvme_card
E: DEVNAME=/dev/nvme0n1
E: DEVPATH=/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n1
E: DEVTYPE=disk
E: ID_FS_TYPE=ext4
E: ID_FS_USAGE=filesystem
E: ID_FS_UUID=7dc97564-f38e-451d-9a67-e5e6b2ed9b40
E: ID_FS_UUID_ENC=7dc97564-f38e-451d-9a67-e5e6b2ed9b40
E: ID_FS_VERSION=1.0
E: ID_MODEL=nvme_card
E: ID_PATH=pci-0000:00:04.0-nvme-1
E: ID_PATH_TAG=pci-0000_00_04_0-nvme-1
E: ID_SERIAL=nvme_card_nvme_card
E: ID_SERIAL_SHORT=nvme_card
E: ID_WWN=nvme.1ae0-6e766d655f63617264-6e766d655f63617264-00000001
E: MAJOR=259
E: MINOR=0
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=8310822
lucab commented 6 years ago

It may be worth checking what Google OS does, as they may be stuffing that information somewhere in the device metadata similarly to AWS EBS (e.g. https://github.com/coreos/init/pull/268).

lucab commented 6 years ago

I did some further investigation on this, but now I'm more confused than before.

On a Container-Optimized OS from Google (version 10718.59.0) I didn't see any additional NVMe by-id symlinks (compared to what CL has). On a Ubuntu 18.04 I see additional /dev/disk/by-id/google-local-nvme-ssd-X entries.

That is:

# ContainerOS
$ ls -la /dev/disk/by-id/ | grep nvme
lrwxrwxrwx 1 root root  13 Jul 23 14:16 nvme-nvme.1ae0-6e766d655f63617264-6e766d655f63617264-00000001 -> ../../nvme0n1
lrwxrwxrwx 1 root root  13 Jul 23 14:16 nvme-nvme_card_nvme_card -> ../../nvme0n1

# Ubuntu
$ ls -la /dev/disk/by-id/ | grep nvme
lrwxrwxrwx 1 root root  13 Jul 23 14:23 google-local-nvme-ssd-0 -> ../../nvme0n1
lrwxrwxrwx 1 root root  13 Jul 23 14:23 nvme-nvme.1ae0-6e766d655f63617264-6e766d655f63617264-00000001 -> ../../nvme0n1
lrwxrwxrwx 1 root root  13 Jul 23 14:23 nvme-nvme_card_nvme_card -> ../../nvme0n1

Neither of those match what is expected from the documentation (i.e. /dev/disk/by-id/google-local-ssd-x). It looks like ContainerOS does not carry any udev rules for their own NVMe, while Ubuntu carries a fixed set (up to 7) coming from https://github.com/GoogleCloudPlatform/compute-image-packages/blob/20180611/google_config/udev/65-gce-disk-naming.rules.

I guess we may want to carry 65-gce-disk-naming.rules as part of our OEM GCE bits. This still won't match Google doc. @negz you may want to contact their support to get either docs or udev rules fixed to be in agreement.