canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
3.02k stars 888 forks source link

cc_disk_setup always overwrites filesystem on raw disk #3902

Open ubuntu-server-builder opened 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #1943156

Launchpad details
affected_projects = []
assignee = None
assignee_name = None
date_closed = None
date_created = 2021-09-09T15:21:00.623940+00:00
date_fix_committed = None
date_fix_released = None
id = 1943156
importance = medium
is_complete = False
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1943156
milestone = None
owner = optimum-reflex
owner_name = xstaticxgpx
private = False
status = triaged
submitter = optimum-reflex
submitter_name = xstaticxgpx
tags = []
duplicates = []

Launchpad user xstaticxgpx(optimum-reflex) wrote on 2021-09-09T15:21:00.623940+00:00

See https://github.com/canonical/cloud-init/blob/758acf976f2cb67a85411467fa5fca2ea17a2283/cloudinit/config/cc_disk_setup.py#L1005-L1009

        # File systems that support the -F flag
        if overwrite or device_type(device) == "disk":
            force_flag = lookup_force_flag(fs_type)
            if force_flag:
                fs_cmd.append(force_flag)

Even with overwrite: false when using a raw disk device (ie. /dev/sdb) the filesystem will always be overwritten. This seems unintended, as it's definitely not clarified in the documentation.

I think at the very least the documentation should be clarified that raw disk devices will ALWAYS be overwritten by default.

We ended up overwriting our cmd value to prevent the force flag from being added...

  - label: DATA
    filesystem: xfs
    device: /dev/sdb
    cmd: '/usr/sbin/mkfs.xfs %(device)s -L %(label)s'
ubuntu-server-builder commented 1 year ago

Launchpad user James Falcon(falcojr) wrote on 2021-09-14T14:01:46.414254+00:00

Is it possible for you to post your entire disk_setup/fs_setup in your user data and also attach the tarball from "cloud-init collect-logs"?

I think the mismatch in expectations it that cloud-init will overwrite the FS if you've specified something to be written that doesn't match what it finds on disk. Either that or we have a bug detecting what's on disk. If what you've specified is found, we should be exiting early here: https://github.com/canonical/cloud-init/blob/main/cloudinit/config/cc_disk_setup.py#L941

If partition isn't specified, it defaults to any. This is described in the docs with: "The partition option may also be set to any, in which case any file system that matches type and device will cause this module to skip filesystem creation for the fs_setup entry, regardless of label matching or not."

If you attach your logs, I can look to see if there's a mismatch between what you specified and what cloud-init found.

ubuntu-server-builder commented 1 year ago

Launchpad user xstaticxgpx(optimum-reflex) wrote on 2021-09-14T15:53:59.920123+00:00

Agreed, I initially thought something was wrong with the disk labels and that's why it couldn't detect the existing disk and overwrote it. However, that was not the case. The raw disks were definitely labeled correctly however still got overwritten. I had not tried changing the partition value though, which we originally had as none (see below).

Given the code section I linked above the logic seems pretty clear cut, if the device TYPE is "disk" (as returned by lsblk as seen below) then it will always append the force flag and ultimately overwrite the existing filesystem.

From the lsblk command seen in that same cc_disk_setup module:

# lsblk --pairs --output NAME,TYPE,FSTYPE,LABEL
...
NAME="sdb" TYPE="disk" FSTYPE="btrfs" LABEL="CONTAINERS"
NAME="sdc" TYPE="disk" FSTYPE="xfs" LABEL="DB"

Here's what the original fs_setup looked like:

fs_setup:
  - label: CONTAINERS
    filesystem: btrfs
    device: /dev/disk/by-id/google-data
    partition: none
    overwrite: false
  - label: DB
    filesystem: xfs
    device: /dev/disk/by-id/google-data-db
    partition: none
    overwrite: false

And here's the relevant section of cloud-init.log:

2021-09-07 21:24:03,035 - stages.py[DEBUG]: Running module disk_setup (<module 'cloudinit.config.cc_disk_setup' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_disk_setup.py'>) with frequency once-per-instance
2021-09-07 21:24:03,036 - handlers.py[DEBUG]: start: init-network/config-disk_setup: running config-disk_setup with frequency once-per-instance
2021-09-07 21:24:03,036 - util.py[DEBUG]: Writing to /var/lib/cloud/instances/1857211666653669050/sem/config_disk_setup - wb: [644] 23 bytes
2021-09-07 21:24:03,036 - helpers.py[DEBUG]: Running config-disk_setup using lock (<FileLock using file '/var/lib/cloud/instances/1857211666653669050/sem/config_disk_setup'>)
2021-09-07 21:24:03,036 - cc_disk_setup.py[DEBUG]: setting up filesystems: [{'device': '/dev/disk/by-id/google-data', 'filesystem': 'btrfs', 'label': 'CONTAINERS', 'overwrite': False, 'partition': 'none'}, {'device': '/dev/disk/by-id/google-data-db', 'filesystem': 'xfs', 'label': 'DB', 'overwrite': False, 'partition': 'none'}]
2021-09-07 21:24:03,037 - cc_disk_setup.py[DEBUG]: Creating new filesystem.
2021-09-07 21:24:03,037 - subp.py[DEBUG]: Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=True)
2021-09-07 21:24:03,054 - cc_disk_setup.py[DEBUG]: Checking /dev/sdb against default devices
2021-09-07 21:24:03,054 - cc_disk_setup.py[DEBUG]: Using the raw device to place filesystem CONTAINERS on
2021-09-07 21:24:03,054 - cc_disk_setup.py[DEBUG]: File system type 'btrfs' with label 'CONTAINERS' will be created on /dev/sdb
2021-09-07 21:24:03,054 - subp.py[DEBUG]: Running command ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/sdb', '--nodeps'] with allowed return codes [0] (shell=False, capture=True)
2021-09-07 21:24:03,059 - cc_disk_setup.py[DEBUG]: Creating file system CONTAINERS on /dev/sdb
2021-09-07 21:24:03,060 - cc_disk_setup.py[DEBUG]:      Using cmd: ['/usr/sbin/mkfs.btrfs', '/dev/sdb', '-L', 'CONTAINERS', '-f']
2021-09-07 21:24:03,060 - subp.py[DEBUG]: Running command ['/usr/sbin/mkfs.btrfs', '/dev/sdb', '-L', 'CONTAINERS', '-f'] with allowed return codes [0] (shell=False, capture=True)
2021-09-07 21:24:03,099 - util.py[DEBUG]: Creating fs for /dev/disk/by-id/google-data took 0.062 seconds
2021-09-07 21:24:03,099 - cc_disk_setup.py[DEBUG]: Creating new filesystem.
2021-09-07 21:24:03,099 - subp.py[DEBUG]: Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=True)
2021-09-07 21:24:03,134 - cc_disk_setup.py[DEBUG]: Checking /dev/sdc against default devices
2021-09-07 21:24:03,134 - cc_disk_setup.py[DEBUG]: Using the raw device to place filesystem DB on
2021-09-07 21:24:03,134 - cc_disk_setup.py[DEBUG]: File system type 'xfs' with label 'DB' will be created on /dev/sdc
2021-09-07 21:24:03,134 - subp.py[DEBUG]: Running command ['/usr/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/sdc', '--nodeps'] with allowed return codes [0] (shell=False, capture=True)
2021-09-07 21:24:03,137 - cc_disk_setup.py[DEBUG]: Creating file system DB on /dev/sdc
2021-09-07 21:24:03,138 - cc_disk_setup.py[DEBUG]:      Using cmd: ['/usr/sbin/mkfs.xfs', '/dev/sdc', '-L', 'DB', '-f']
2021-09-07 21:24:03,138 - subp.py[DEBUG]: Running command ['/usr/sbin/mkfs.xfs', '/dev/sdc', '-L', 'DB', '-f'] with allowed return codes [0] (shell=False, capture=True)
2021-09-07 21:24:04,759 - util.py[DEBUG]: Creating fs for /dev/disk/by-id/google-data-db took 1.660 seconds
2021-09-07 21:24:04,759 - handlers.py[DEBUG]: finish: init-network/config-disk_setup: SUCCESS: config-disk_setup ran successfully
ubuntu-server-builder commented 1 year ago

Launchpad user James Falcon(falcojr) wrote on 2021-09-14T16:15:07.080025+00:00

Thanks, I see now. Yes, I agree that this behavior doesn't make sense. For now I'll make a doc change, but I think long term we should change the behavior.