coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
260 stars 60 forks source link

WARNING: SCSI device dm-0 has no device ID on LUKS device #1712

Open freedge opened 2 months ago

freedge commented 2 months ago

Describe the bug

Booting CoreOS with an extra disk configured with luks encryption, I am getting this warning:

# journalctl -t  55-scsi-sg3_id.rules --no-pager
Apr 23 07:11:48 localhost 55-scsi-sg3_id.rules[1110]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:48 localhost 55-scsi-sg3_id.rules[1141]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:53 localhost 55-scsi-sg3_id.rules[1286]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:53 localhost 55-scsi-sg3_id.rules[1289]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:55 localhost 55-scsi-sg3_id.rules[1381]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:55 localhost 55-scsi-sg3_id.rules[1412]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:57 localhost 55-scsi-sg3_id.rules[1448]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:58 localhost 55-scsi-sg3_id.rules[1478]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:58 localhost 55-scsi-sg3_id.rules[1508]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:48 localhost 55-scsi-sg3_id.rules[1110]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:48 localhost 55-scsi-sg3_id.rules[1141]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:53 localhost 55-scsi-sg3_id.rules[1286]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:53 localhost 55-scsi-sg3_id.rules[1289]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:55 localhost 55-scsi-sg3_id.rules[1381]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:55 localhost 55-scsi-sg3_id.rules[1412]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:57 localhost 55-scsi-sg3_id.rules[1448]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:58 localhost 55-scsi-sg3_id.rules[1478]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:11:58 localhost 55-scsi-sg3_id.rules[1508]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules
Apr 23 07:12:02 core 55-scsi-sg3_id.rules[2104]: WARNING: SCSI device dm-0 has no device ID, consider changing .SCSI_ID_SERIAL_SRC in 00-scsi-sg3_config.rules

Reproduction steps

  1. create VM with 2 disks
  2. configure second disk for luks decryption
  3. get a warning

Expected behavior

no warning

Actual behavior

a warning

System details

# rpm-ostree status
State: idle
AutomaticUpdatesDriver: Zincati
  DriverState: active; periodically polling for updates (last checked Tue 2024-04-23 05:43:04 UTC)
Deployments:
● fedora:fedora/x86_64/coreos/next
                  Version: 40.20240416.1.1 (2024-04-19T04:22:07Z)
                   Commit: 0006e2ccac33f9b6e944c34ab5bfc1cecacd505fb514488fcc5cb35c151af90b
             GPGSignature: Valid signature by 115DF9AEF857853EE8445D0A0727707EA15B79CC

This is on hyper-v, booted with

Expand-Archive .\Downloads\fedora-coreos-40.20240416.1.1-hyperv.x86_64.vhdx.zip -DestinationPath .\Downloads\ -Force
New-VM -Name core -MemoryStartupBytes 4096MB -Generation 2 -BootDevice VHD -VHDPath .\Downloads\fedora-coreos-40.20240416.1.1-hyperv.x86_64.vhdx -SwitchName "Default Switch"
Set-Vm core -MemoryMinimumBytes 2048MB  -CheckpointType disabled -Notes "Fedora CoreOS 40"
Set-VMFirmware -VMName core -EnableSecureBoot Off
Set-VMKeyProtector -VMName core -NewLocalKeyProtector
Enable-VMTPM -VMName core
Add-VMNetworkAdapter -SwitchName br-mine -VmName core -Name eth1
New-VHD -Path .\Downloads\extra.vhdx -SizeBytes 400MB -Dynamic
Add-VMHardDiskDrive -VMName core -Path .\Downloads\extra.vhdx
scp stream:hyperv.ign .
.\Downloads\kvpctl.exe core add-ign hyperv.ign
Start-VM core

Butane or Ignition config

variant: fcos
version: 1.5.0

passwd:
  users:
    - name: frigo
      ssh_authorized_keys:
      - "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA3EVo0gswYxVGq6MEHUQTVnDpYLnmWnZnRcYGfuLzrULcPX4tgBYpILOlxI0lZXMj+Ebnf2FWohp1MLbC349uwI92CMTwSk6blWW/LX1NJWG5vKDmUx25B7DiPFdIMWp23EJZk/x5NshFwgooKgzDKj8/Y5lQeEI6OsGoOPxGNbUD5/rQDALnEybrXgPckyOlqNa43TB9/rDFa4KABUEHXQDWPzmsWRONVYIRjJ9wUrzgvAdJo6Q6qDkJpMkXpzFdSjcwGbWx4Dx9ylWp5zQSHXTfI3mqbAmW2lCDOEUKGUZVyYeHjFtsPKWKx+5hVUsCiK3V9zhJVlOnGtjr8Obbyw=="
      system: false
      groups:
      - wheel
      - sudo

storage:
  disks:
    - device: /dev/disk/by-id/coreos-boot-disk
      wipe_table: false
      partitions:
        - label: root
          number: 4
          size_mib: 0
          resize: true
    - device: /dev/sdb
      wipe_table: true
      partitions:
        - label: etcd
          number: 1
          size_mib: 0
          resize: true

  filesystems:
    - device: /dev/mapper/etcd
      format: xfs
      label: etcd
      wipe_filesystem: true
      with_mount_unit: true
      path: /var/lib/etcd

  luks:
    - clevis:
        tpm2: true
      device: /dev/disk/by-partlabel/etcd
      label: luks-etcd
      wipe_volume: true
      name: etcd
      options:
      - "--sector-size"
      - "512"
      discard: true
      open_options:
        - --perf-no_read_workqueue
        - --perf-no_write_workqueue

  files:
    - path: /etc/udev/rules.d/01-block.rules
      mode: 0644
      contents:
        inline: |
          KERNEL=="sd*[!0-9]", KERNELS=="0:0:0:1", ATTRS{vendor}=="Msft*", ATTR{size}=="819200", SYMLINK+="disk-etcd"

    - path: /etc/hosts
      contents:
        inline: |
          127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
          ::1       localhost localhost.localdomain localhost6 localhost6.localdomain6
          10.224.123.3 core
      overwrite: true

    - path: /etc/NetworkManager/system-connections/eth0.nmconnection
      mode: 0600
      contents:
        inline: |
          [connection]
          id=eth0
          type=ethernet
          interface-name=eth0
          [ipv4]
          address1=10.224.123.3/20,10.224.123.1
          dns=192.168.1.254;
          dns-search=
          may-fail=false
          method=manual

    - path: /etc/NetworkManager/system-connections/eth1.nmconnection
      mode: 0600
      contents:
        inline: |
          [connection]
          id=eth1
          type=ethernet
          interface-name=eth1
          [ipv4]
          address1=172.0.0.3/24
          dns=
          dns-search=
          may-fail=false
          method=manual

    - path: /etc/chrony.conf
      mode: 0644
      overwrite: true
      contents:
        inline: |
          local stratum 2
          refclock PHC /dev/ptp_hyperv poll 3 dpoll -2 offset 0
          driftfile /var/lib/chrony/drift
          makestep 1.0 -1
          rtcsync
          keyfile /etc/chrony.keys
          ntsdumpdir /var/lib/chrony
          leapsectz right/UTC
          logdir /var/log/chrony

    - path: /etc/hostname
      mode: 0644
      contents:
        inline: core

  links:
    - path: /etc/localtime
      target: ../usr/share/zoneinfo/Europe/Monaco

Additional information

I would also like to not reference "/dev/sdb" but instead rely on "/dev/disk-etcd" as created using my own rules, but this does not seem possible.

$ udevadm info /dev/sdb --no-pager
P: /devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/01ce8166-b4b1-4bfa-9284-7c9082654950/host0/target0:0:0/0:0:0:1/block/sdb
M: sdb
U: block
T: disk
D: b 8:16
N: sdb
L: 0
S: disk/by-id/scsi-3600224806509d192d362bf93d8cd70ed
S: disk-etcd
S: disk/by-diskseq/2
S: disk/by-id/wwn-0x600224806509d192d362bf93d8cd70ed
S: disk/by-path/acpi-VMBUS:00-vmbus-01ce8166b4b14bfa92847c9082654950-lun-1
Q: 2
E: DEVPATH=/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/01ce8166-b4b1-4bfa-9284-7c9082654950/host0/target0:0:0/0:0:0:1/block/sdb
E: DEVNAME=/dev/sdb
E: DEVTYPE=disk
E: DISKSEQ=2
E: MAJOR=8
E: MINOR=16
E: SUBSYSTEM=block
E: USEC_INITIALIZED=32528719
E: ID_SCSI=1
E: ID_VENDOR=Msft
E: ID_VENDOR_ENC=Msft\x20\x20\x20\x20
E: ID_MODEL=Virtual_Disk
E: ID_MODEL_ENC=Virtual\x20Disk\x20\x20\x20\x20
E: ID_REVISION=1.0
E: ID_TYPE=disk
E: ID_SERIAL=3600224806509d192d362bf93d8cd70ed
E: ID_SERIAL_SHORT=600224806509d192d362bf93d8cd70ed
E: ID_WWN=0x600224806509d192
E: ID_WWN_VENDOR_EXTENSION=0xd362bf93d8cd70ed
E: ID_WWN_WITH_EXTENSION=0x600224806509d192d362bf93d8cd70ed
E: ID_BUS=scsi
E: ID_PATH=acpi-VMBUS:00-vmbus-01ce8166b4b14bfa92847c9082654950-lun-1
E: ID_PATH_TAG=acpi-VMBUS_00-vmbus-01ce8166b4b14bfa92847c9082654950-lun-1
E: ID_PART_TABLE_UUID=f1a87498-66ac-4482-b48b-c58dbfd4e54c
E: ID_PART_TABLE_TYPE=gpt
E: SCSI_TPGS=0
E: SCSI_TYPE=disk
E: SCSI_VENDOR=Msft
E: SCSI_VENDOR_ENC=Msft\x20\x20\x20\x20
E: ID_SCSI_INQUIRY=1
E: SCSI_IDENT_LUN_T10=4d534654202020206509d192d362b141b21bbf93d8cd70ed
E: SCSI_IDENT_LUN_NAA_REGEXT=600224806509d192d362bf93d8cd70ed
E: DEVLINKS=/dev/disk/by-id/scsi-3600224806509d192d362bf93d8cd70ed /dev/disk-etcd /dev/disk/by-diskseq/2 /dev/disk/by-id/wwn-0x600224806509d192d362bf93d8cd70ed /dev/disk/by-path/acpi-VMBUS:00-vmbus-01ce8166b4b14bfa92847c9082654950-lun-1
E: TAGS=:systemd:
E: CURRENT_TAGS=:systemd:

$ udevadm info /dev/dm-0 --no-pager
P: /devices/virtual/block/dm-0
M: dm-0
R: 0
U: block
T: disk
D: b 253:0
N: dm-0
L: 0
S: mapper/etcd
S: disk/by-label/etcd
S: disk/by-id/dm-uuid-CRYPT-LUKS2-d945a3c3ed1c4814970bfab25b2a32f0-etcd
S: disk/by-uuid/ee4d45ea-1fe7-4420-b03d-281e7dc7315b
S: disk/by-id/dm-name-etcd
Q: 4
E: DEVPATH=/devices/virtual/block/dm-0
E: DEVNAME=/dev/dm-0
E: DEVTYPE=disk
E: DISKSEQ=4
E: MAJOR=253
E: MINOR=0
E: SUBSYSTEM=block
E: USEC_INITIALIZED=25599079
E: DM_UDEV_DISABLE_LIBRARY_FALLBACK_FLAG=1
E: DM_UDEV_PRIMARY_SOURCE_FLAG=1
E: DM_UDEV_RULES_VSN=2
E: DM_ACTIVATION=1
E: DM_NAME=etcd
E: DM_UUID=CRYPT-LUKS2-d945a3c3ed1c4814970bfab25b2a32f0-etcd
E: DM_SUSPENDED=0
E: ID_FS_LABEL=etcd
E: ID_FS_LABEL_ENC=etcd
E: ID_FS_UUID=ee4d45ea-1fe7-4420-b03d-281e7dc7315b
E: ID_FS_UUID_ENC=ee4d45ea-1fe7-4420-b03d-281e7dc7315b
E: ID_FS_SIZE=334475264
E: ID_FS_LASTBLOCK=98043
E: ID_FS_BLOCKSIZE=4096
E: ID_FS_TYPE=xfs
E: ID_FS_USAGE=filesystem
E: SYSTEMD_READY=1
E: DEVLINKS=/dev/mapper/etcd /dev/disk/by-label/etcd /dev/disk/by-id/dm-uuid-CRYPT-LUKS2-d945a3c3ed1c4814970bfab25b2a32f0-etcd /dev/disk/by-uuid/ee4d45ea-1fe7-4420-b03d-281e7dc7315b /dev/disk/by-id/dm-name-etcd
E: TAGS=:systemd:
E: CURRENT_TAGS=:systemd:
jbtrystram commented 2 months ago

sg3_utils got updated recently and caused some troubles. Thanks for reporting

According to your udevadm info output, your udev rule seems to work however :

E: DEVLINKS=/dev/disk/by-id/scsi-36..... /dev/disk-etcd .....

Related #1670

freedge commented 2 months ago

the udev rule works but is too late, it should be present in the initramdisk for me to be able to reference it in the disks part.

jbtrystram commented 2 months ago

can you try to boot with the following kernel argument : udev.scsi_symlink_src=S ?

edit: you should see the disk at /dev/disk/by-id/disk-etcd

jbtrystram commented 2 months ago

I was trying to replicate your issue, but I see two different issues there :

1 - your additional disk (microsoft virtual disk)- /dev/sdb, not showing up with as /dev/disk-etcd For this one there isn't much we can do to get your udev rule into initramfs. You should reference it through it's WWN from the ignition config to have a non-ambiguous pointer : disk/by-id/wwn-0x600224806509d192d362bf93d8cd70ed You can probably specify a WWN value at creation time through hyper-v options

2 - The decrypted device mapper dm-0 generates warning issues in the logs : it looks like luksformat does not allow to set custom scsi serial values to the result device, I am investigating

travier commented 2 months ago

It's been a while since I've looked at LUKS setup but that won't work from my understanding. You are referencing the device by a label that you are asking Ignition to create so Ignition won't be able to find it.

My bad, you're defining this partition before.

I would also like to not reference "/dev/sdb" but instead rely on "/dev/disk-etcd" as created using my own rules, but this does not seem possible.

You would have the issue I wrote above if you did that.

freedge commented 2 months ago

you shared https://github.com/openshift/os/blob/master/docs/faq.md#q-how-do-i-configure-a-secondary-block-device-via-ignitionmc-if-the-name-varies-on-each-node which seems overly complex since it will need to take care of luks too, redoing a lot of what ignition does fine already.

It's not pratical to use the wwn in the ignition config as I am planning to do the same on OpenShift, where the same machineconfig will be applied to multiple servers. Ideally something like a kargs cmdline like this wwnsymlinks=etcd:by-id/wwn-0x600224806509d192d362bf93d8cd70ed,root:by-id/wwn-0x600224806509d192d362bf93d8cd70ed with an appropriate udev rule bundled in the ramdisk could help. Another idea would be to have a list of devices under disks.device, so instead of a single wwn I can put multiple and the system will iterate and work with whichever one it finds. Meanwhile I'm going to keep using /dev/sdb.

I see some work done for qemu to specify a wwn but this works on a single platform. On my baremetal servers I can define a name on a virtual disk (owned by a perc controller) but this name is not retrieved at all or used by any udev rule (seems it needs perrcli, a proprietary tool, to be extracted).

Meanwhile the root disk is referenced as "/dev/disk/by-id/coreos-boot-disk" and I am wondering how this is accurately selected for servers booting on the network having multiple disks?

freedge commented 2 months ago

I run sg3_utils-1.48-1.fc40.x86_64 (https://sg.danny.cz/sg/p/sg3_utils-1.48.tar.xz) while the udev.scsi_symlink_src option seems to have been introduced in a later commit (I.. think). This udev option btw triggers another warning in systemd

systemd-udevd[794]: Unknown udev kernel command line option "udev.scsi_symlink_src", ignoring.

the system works fine despite those warnings (I'm more concerned about the fact that I still need /dev/sdb)

jlebon commented 2 months ago

There isn't really a good fix for this right now. This boils down to https://github.com/openshift/machine-config-operator/issues/1720 at the OCP level.

If this is UPI, probably the cleanest thing is to have a --pre-install hook that determines what the secondary disk is and does the initial formatting, setting the partition label at least. Then Ignition on first boot of the installed system can set up LUKS and the filesystem on top in a generic way (read, as part of a MachineConfig).