canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 927 forks source link

Resizing root partition with LVM as backend creates broken volume (device-mapper: reached low water mark for data device: sending event.) #10959

Closed AndersTrier closed 1 year ago

AndersTrier commented 1 year ago

Required information

Issue description

Resize root partition of container using LVM as storage backend. # lxc --debug config device override testcontainer root size=100GB Output: https://pastebin.com/mhtKu3dp

All seems fine:

# lvdisplay 
  --- Logical volume ---
  LV Path                /dev/MyVolGroup/data
  LV Name                data
  VG Name                MyVolGroup
  LV UUID                d8QAlD-5aRq-xgv3-9dsv-MXXg-13OT-bcTM4F
  LV Write Access        read/write
  LV Creation host, time espressobin, 2019-06-04 11:31:15 +0200
  LV Status              available
  # open                 0
  LV Size                <10.92 TiB
  Current LE             2861487
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     6144
  Block device           253:2

  --- Logical volume ---
  LV Name                LXDThinPool
  VG Name                default
  LV UUID                GcHERE-sxux-bN2T-sVEU-aqAf-2I73-guwvwA
  LV Write Access        read/write (activated read only)
  LV Creation host, time homeserver, 2022-09-17 15:31:38 +0200
  LV Pool metadata       LXDThinPool_tmeta
  LV Pool data           LXDThinPool_tdata
  LV Status              available
  # open                 0
  LV Size                <4.64 GiB
  Allocated pool data    100.00%
  Allocated metadata     30.47%
  Current LE             1187
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:6

  --- Logical volume ---
  LV Path                /dev/default/containers_ubuntucontainer
  LV Name                containers_ubuntucontainer
  VG Name                default
  LV UUID                Da6M2H-n48t-f7B9-4Oy2-SnAL-Obq7-aqMpxB
  LV Write Access        read/write
  LV Creation host, time homeserver, 2022-09-17 17:51:43 +0200
  LV Pool name           LXDThinPool
  LV Status              available
  # open                 1
  LV Size                10.00 GiB
  Mapped size            19.71%
  Current LE             2560
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:7

  --- Logical volume ---
  LV Path                /dev/default/images_4979c8f15a0003b2b72cac34e9676b4107ae330874935284d0ba5d54199c7744
  LV Name                images_4979c8f15a0003b2b72cac34e9676b4107ae330874935284d0ba5d54199c7744
  VG Name                default
  LV UUID                QsDQyS-VAJx-ivBH-Y972-1Tyh-zOgr-9uRs4L
  LV Write Access        read/write
  LV Creation host, time homeserver, 2022-09-24 15:26:27 +0200
  LV Pool name           LXDThinPool
  LV Status              NOT available
  LV Size                10.00 GiB
  Current LE             2560
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto

  --- Logical volume ---
  LV Path                /dev/default/images_cae9c968417157b021b14040d3fac6c29edbf0e3b4320e64794c0374b46929fc
  LV Name                images_cae9c968417157b021b14040d3fac6c29edbf0e3b4320e64794c0374b46929fc
  VG Name                default
  LV UUID                HvuvdH-Ohap-6IoA-TIWe-mrC1-fy20-15KXe6
  LV Write Access        read/write
  LV Creation host, time homeserver, 2022-09-27 15:43:16 +0200
  LV Pool name           LXDThinPool
  LV Status              NOT available
  LV Size                10.00 GiB
  Current LE             2560
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto

  --- Logical volume ---
  LV Path                /dev/default/containers_testcontainer
  LV Name                containers_testcontainer
  VG Name                default
  LV UUID                b6BNgC-u0g2-klgR-t7Bq-u2HY-v862-oeAO2e
  LV Write Access        read/write
  LV Creation host, time homeserver, 2022-09-27 19:53:23 +0200
  LV Pool name           LXDThinPool
  LV Thin origin name    images_4979c8f15a0003b2b72cac34e9676b4107ae330874935284d0ba5d54199c7744
  LV Status              available
  # open                 1
  LV Size                93.13 GiB
  Mapped size            2.35%
  Current LE             23842
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:8

  --- Logical volume ---
  LV Path                /dev/vg0/os
  LV Name                os
  VG Name                vg0
  LV UUID                IqFUE3-asNO-OidK-QMNR-22kt-Ekrl-WuzxoF
  LV Write Access        read/write
  LV Creation host, time staker, 2022-09-15 22:19:39 +0200
  LV Status              available
  # open                 1
  LV Size                <13.97 GiB
  Current LE             3576
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0

  --- Logical volume ---
  LV Path                /dev/vg0/otherdata
  LV Name                otherdata
  VG Name                vg0
  LV UUID                doebvg-08xd-2AY1-ItRh-0b07-AA29-Cjjq0I
  LV Write Access        read/write
  LV Creation host, time staker, 2022-09-15 23:25:02 +0200
  LV Status              available
  # open                 1
  LV Size                1.50 TiB
  Current LE             393216
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:1
# dmsetup info /dev/dm-8
Name:              default-containers_testcontainer
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 8
Number of targets: 1
UUID: LVM-rk5AmXKDDc3XoZPzNZP2Sha9y3N1oDqpb6BNgCu0g2klgRt7Bqu2HYv862oeAO2e

If I start using the new container, I'll soon start to experience problems

[252737.608547] audit: type=1400 audit(1664301204.971:447): apparmor="STATUS" operation="profile_load" label="lxd-testcontainer_</var/snap/lxd/common/lxd>//&:lxd-testcontainer_<var-snap-lxd-common-lxd>:unconfined" name="lsb_release" pid=87253 comm="apparmor_parser"
[252737.610947] audit: type=1400 audit(1664301204.975:448): apparmor="STATUS" operation="profile_load" label="lxd-testcontainer_</var/snap/lxd/common/lxd>//&:lxd-testcontainer_<var-snap-lxd-common-lxd>:unconfined" name="nvidia_modprobe" pid=87254 comm="apparmor_parser"
[252737.612025] audit: type=1400 audit(1664301204.975:449): apparmor="STATUS" operation="profile_load" label="lxd-testcontainer_</var/snap/lxd/common/lxd>//&:lxd-testcontainer_<var-snap-lxd-common-lxd>:unconfined" name="nvidia_modprobe//kmod" pid=87254 comm="apparmor_parser"
[252737.619090] audit: type=1400 audit(1664301204.983:450): apparmor="STATUS" operation="profile_load" label="lxd-testcontainer_</var/snap/lxd/common/lxd>//&:lxd-testcontainer_<var-snap-lxd-common-lxd>:unconfined" name="/usr/bin/man" pid=87256 comm="apparmor_parser"
[252737.619740] audit: type=1400 audit(1664301204.983:451): apparmor="STATUS" operation="profile_load" label="lxd-testcontainer_</var/snap/lxd/common/lxd>//&:lxd-testcontainer_<var-snap-lxd-common-lxd>:unconfined" name="man_filter" pid=87256 comm="apparmor_parser"
[252737.620667] audit: type=1400 audit(1664301204.983:452): apparmor="STATUS" operation="profile_load" label="lxd-testcontainer_</var/snap/lxd/common/lxd>//&:lxd-testcontainer_<var-snap-lxd-common-lxd>:unconfined" name="man_groff" pid=87256 comm="apparmor_parser"
[252737.672805] audit: type=1400 audit(1664301205.035:453): apparmor="STATUS" operation="profile_load" label="lxd-testcontainer_</var/snap/lxd/common/lxd>//&:lxd-testcontainer_<var-snap-lxd-common-lxd>:unconfined" name="tcpdump" pid=87257 comm="apparmor_parser"
[252737.704245] audit: type=1400 audit(1664301205.067:454): apparmor="STATUS" operation="profile_load" label="lxd-testcontainer_</var/snap/lxd/common/lxd>//&:lxd-testcontainer_<var-snap-lxd-common-lxd>:unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=87255 comm="apparmor_parser"
[252737.705344] audit: type=1400 audit(1664301205.071:455): apparmor="STATUS" operation="profile_load" label="lxd-testcontainer_</var/snap/lxd/common/lxd>//&:lxd-testcontainer_<var-snap-lxd-common-lxd>:unconfined" name="/usr/lib/NetworkManager/nm-dhcp-helper" pid=87255 comm="apparmor_parser"
[252742.525937] kauditd_printk_skb: 26 callbacks suppressed
[252742.525939] audit: type=1400 audit(1664301209.891:482): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-testcontainer_<var-snap-lxd-common-lxd>" profile="/snap/snapd/16778/usr/lib/snapd/snap-confine" pid=87570 comm="snap-confine" family="netlink" sock_type="raw" protocol=15 requested_mask="send receive" denied_mask="send receive"
[252742.544069] audit: type=1400 audit(1664301209.907:483): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-testcontainer_<var-snap-lxd-common-lxd>" profile="snap-update-ns.lxd" name="/apparmor/.null" pid=87591 comm="6" requested_mask="wr" denied_mask="wr" fsuid=1000000 ouid=0
[252742.813501] audit: type=1400 audit(1664301210.179:484): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-testcontainer_<var-snap-lxd-common-lxd>" profile="snap.lxd.hook.install" name="/apparmor/.null" pid=87570 comm="snap-exec" requested_mask="wr" denied_mask="wr" fsuid=1000000 ouid=0
[252743.966981] audit: type=1400 audit(1664301211.331:485): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-testcontainer_<var-snap-lxd-common-lxd>" profile="/snap/snapd/16778/usr/lib/snapd/snap-confine" pid=87857 comm="snap-confine" family="netlink" sock_type="raw" protocol=15 requested_mask="send receive" denied_mask="send receive"
[252743.969621] audit: type=1400 audit(1664301211.335:486): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-testcontainer_<var-snap-lxd-common-lxd>" profile="snap.lxd.hook.configure" name="/apparmor/.null" pid=87857 comm="snap-exec" requested_mask="wr" denied_mask="wr" fsuid=1000000 ouid=0
[252743.974700] audit: type=1400 audit(1664301211.339:487): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-testcontainer_<var-snap-lxd-common-lxd>" profile="/snap/snapd/16778/usr/lib/snapd/snap-confine" name="/apparmor/.null" pid=87857 comm="aa-exec" requested_mask="wr" denied_mask="wr" fsuid=1000000 ouid=0
[252744.189653] audit: type=1400 audit(1664301211.555:488): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-testcontainer_</var/snap/lxd/common/lxd>" name="/run/systemd/unit-root/proc/" pid=88027 comm="(imedated)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
[252751.177288] dm-8: detected capacity change from 20971520 to 195313664
[252751.278227] EXT4-fs (dm-8): resizing filesystem from 2621440 to 24414208 blocks
[252751.329312] EXT4-fs (dm-8): resized filesystem to 24414208
[252807.688053] device-mapper: thin: 253:5: reached low water mark for data device: sending event.
[252807.696472] device-mapper: thin: 253:5: switching pool to out-of-data-space (queue IO) mode
[252867.947094] device-mapper: thin: 253:5: switching pool to out-of-data-space (error IO) mode
[252874.349607] Aborting journal on device dm-8-8.
[252874.349755] EXT4-fs error (device dm-8): ext4_journal_check_start:83: comm kworker/u64:5: Detected aborted journal
[252874.356143] EXT4-fs (dm-8): Remounting filesystem read-only
[252874.356147] EXT4-fs (dm-8): ext4_writepages: jbd2_start: 13271 pages, ino 403784; err -30

Backing storage is a 2TB nvme disk.

Variuos other dmesg outputs:

[252194.112158] EXT4-fs warning (device dm-8): ext4_end_bio:344: I/O error 3 writing to inode 403909 starting block 48128)
[252194.112166] Buffer I/O error on device dm-8, logical block 48128
[252493.882115] EXT4-fs warning (device dm-8): ext4_end_bio:344: I/O error 3 writing to inode 403909 starting block 48128)
[252493.882123] Buffer I/O error on device dm-8, logical block 48128
[252499.056891] EXT4-fs warning (device dm-8): ext4_end_bio:344: I/O error 3 writing to inode 403909 starting block 11307)
[252499.056901] Buffer I/O error on device dm-8, logical block 11307
[252499.056913] EXT4-fs warning (device dm-8): ext4_end_bio:344: I/O error 3 writing to inode 403909 starting block 11575)
[252499.056916] Buffer I/O error on device dm-8, logical block 11575
[252499.056921] EXT4-fs warning (device dm-8): ext4_end_bio:344: I/O error 3 writing to inode 403909 starting block 11590)
[252499.056923] Buffer I/O error on device dm-8, logical block 11590
[252499.056957] EXT4-fs warning (device dm-8): ext4_end_bio:344: I/O error 3 writing to inode 403909 starting block 11649)
[252499.056959] Buffer I/O error on device dm-8, logical block 11649
[252499.056964] EXT4-fs warning (device dm-8): ext4_end_bio:344: I/O error 3 writing to inode 403909 starting block 11721)
[252499.056966] Buffer I/O error on device dm-8, logical block 11721
[252499.056970] EXT4-fs warning (device dm-8): ext4_end_bio:344: I/O error 3 writing to inode 403909 starting block 11822)
[252499.056972] Buffer I/O error on device dm-8, logical block 11822

Another attempt

[69735.921939] audit: type=1400 audit(1664118200.494:136): apparmor="STATUS" operation="profile_load" label="lxd-cardano_</var/snap/lxd/common/lxd>//&:lxd-cardano_<var-snap-lxd-common-lxd>:unconfined" name="lsb_release" pid=48198 comm="apparmor_parser"
[69735.922095] audit: type=1400 audit(1664118200.494:137): apparmor="STATUS" operation="profile_load" label="lxd-cardano_</var/snap/lxd/common/lxd>//&:lxd-cardano_<var-snap-lxd-common-lxd>:unconfined" name="nvidia_modprobe" pid=48199 comm="apparmor_parser"
[69735.922098] audit: type=1400 audit(1664118200.494:138): apparmor="STATUS" operation="profile_load" label="lxd-cardano_</var/snap/lxd/common/lxd>//&:lxd-cardano_<var-snap-lxd-common-lxd>:unconfined" name="nvidia_modprobe//kmod" pid=48199 comm="apparmor_parser"
[69735.924432] audit: type=1400 audit(1664118200.494:139): apparmor="STATUS" operation="profile_load" label="lxd-cardano_</var/snap/lxd/common/lxd>//&:lxd-cardano_<var-snap-lxd-common-lxd>:unconfined" name="/usr/lib/snapd/snap-confine" pid=48201 comm="apparmor_parser"
[69735.924434] audit: type=1400 audit(1664118200.494:140): apparmor="STATUS" operation="profile_load" label="lxd-cardano_</var/snap/lxd/common/lxd>//&:lxd-cardano_<var-snap-lxd-common-lxd>:unconfined" name="/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=48201 comm="apparmor_parser"
[69735.929476] audit: type=1400 audit(1664118200.498:141): apparmor="STATUS" operation="profile_load" label="lxd-cardano_</var/snap/lxd/common/lxd>//&:lxd-cardano_<var-snap-lxd-common-lxd>:unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=48200 comm="apparmor_parser"
[69735.929481] audit: type=1400 audit(1664118200.498:142): apparmor="STATUS" operation="profile_load" label="lxd-cardano_</var/snap/lxd/common/lxd>//&:lxd-cardano_<var-snap-lxd-common-lxd>:unconfined" name="/usr/lib/NetworkManager/nm-dhcp-helper" pid=48200 comm="apparmor_parser"
[69735.929484] audit: type=1400 audit(1664118200.498:143): apparmor="STATUS" operation="profile_load" label="lxd-cardano_</var/snap/lxd/common/lxd>//&:lxd-cardano_<var-snap-lxd-common-lxd>:unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=48200 comm="apparmor_parser"
[69735.929486] audit: type=1400 audit(1664118200.498:144): apparmor="STATUS" operation="profile_load" label="lxd-cardano_</var/snap/lxd/common/lxd>//&:lxd-cardano_<var-snap-lxd-common-lxd>:unconfined" name="/{,usr/}sbin/dhclient" pid=48200 comm="apparmor_parser"
[69765.925368] device-mapper: thin: 253:5: reached low water mark for data device: sending event.
[69765.929882] device-mapper: thin: 253:5: switching pool to out-of-data-space (queue IO) mode
[69826.644395] device-mapper: thin: 253:5: switching pool to out-of-data-space (error IO) mode
[69826.648245] Aborting journal on device dm-8-8.
[69826.648296] EXT4-fs (dm-8): Delayed block allocation failed for inode 394496 at logical offset 6144 with max blocks 2048 with error 30
[69826.648308] EXT4-fs (dm-8): This should not happen!! Data will be lost

[69826.648309] EXT4-fs error (device dm-8): ext4_journal_check_start:83: comm kworker/u64:4: Detected aborted journal
[69826.648313] EXT4-fs error (device dm-8) in ext4_writepages:2817: Journal has aborted
[69826.649285] EXT4-fs warning (device dm-8): ext4_end_bio:344: I/O error 3 writing to inode 394496 starting block 495600)
[69826.658441] EXT4-fs (dm-8): Remounting filesystem read-only
[69826.658451] EXT4-fs (dm-8): failed to convert unwritten extents to written extents -- potential data loss!  (inode 402831, error -30)
[69826.658463] EXT4-fs (dm-8): failed to convert unwritten extents to written extents -- potential data loss!  (inode 394496, error -30)
[69826.658469] Buffer I/O error on device dm-8, logical block 495465
[69826.658475] Buffer I/O error on device dm-8, logical block 495466
[69826.658478] Buffer I/O error on device dm-8, logical block 495467
[69826.658481] Buffer I/O error on device dm-8, logical block 495468
[69826.658484] Buffer I/O error on device dm-8, logical block 495469
[69826.658487] Buffer I/O error on device dm-8, logical block 495470
[69826.658490] Buffer I/O error on device dm-8, logical block 495471
[69826.658493] Buffer I/O error on device dm-8, logical block 495472
[69826.658496] Buffer I/O error on device dm-8, logical block 495473
[69826.658499] Buffer I/O error on device dm-8, logical block 495474
[69826.658542] EXT4-fs (dm-8): failed to convert unwritten extents to written extents -- potential data loss!  (inode 394497, error -30)
[70211.034812] physSG64sp: renamed from eth0
[70211.058883] lxdbr0: port 2(veth3a4e5ab6) entered disabled state
[70211.060084] vethe6d0128d: renamed from physSG64sp
[70211.108466] device veth3a4e5ab6 left promiscuous mode
[70211.108487] lxdbr0: port 2(veth3a4e5ab6) entered disabled state
[70320.211652] EXT4-fs warning (device dm-7): ext4_end_bio:344: I/O error 3 writing to inode 403687 starting block 1933744)
[70320.211666] buffer_io_error: 141 callbacks suppressed
[70320.211670] Buffer I/O error on device dm-7, logical block 1933739
[70320.211678] Buffer I/O error on device dm-7, logical block 1933740
[70320.211682] Buffer I/O error on device dm-7, logical block 1933741
[70320.211686] Buffer I/O error on device dm-7, logical block 1933742
[70320.211689] Buffer I/O error on device dm-7, logical block 1933743
[70320.211692] Buffer I/O error on device dm-7, logical block 1933744
[70332.495455] EXT4-fs (dm-8): Inode 394496 (000000007acc764d): i_reserved_data_blocks (2049) not cleared!
[70332.645156] device-mapper: thin: 253:5: switching pool to write mode
[70332.842690] audit: type=1400 audit(1664118797.423:145): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-cardano_</var/snap/lxd/common/lxd>" pid=50171 comm="apparmor_parser"
# fdisk -l /dev/nvme0n1
Disk /dev/nvme0n1: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: Samsung SSD 970 EVO Plus 2TB            
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: F1E731C3-3035-45E2-B7F3-3E784065E1CE

Device           Start        End    Sectors   Size Type
/dev/nvme0n1p1    2048    1050623    1048576   512M EFI System
/dev/nvme0n1p2 1050624    3050623    2000000 976.6M Linux filesystem
/dev/nvme0n1p3 3051520 3907028991 3903977472   1.8T Linux LVM
# smartctl -a /dev/nvme0
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-48-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 2TB
Serial Number:                      -
Firmware Version:                   2B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,000,398,934,016 [2.00 TB]
Namespace 1 Utilization:            1,541,083,025,408 [1.54 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5621904d95
Local Time is:                      Tue Sep 27 20:28:43 2022 CEST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.50W       -        -    0  0  0  0        0       0
 1 +     5.90W       -        -    1  1  1  1        0       0
 2 +     3.60W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3      210    1200
 4 -   0.0050W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        57 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    9,770,896 [5.00 TB]
Data Units Written:                 15,602,115 [7.98 TB]
Host Read Commands:                 150,727,819
Host Write Commands:                31,448,723
Controller Busy Time:               626
Power Cycles:                       33
Power On Hours:                     130
Unsafe Shutdowns:                   2
Media and Data Integrity Errors:    0
Error Information Log Entries:      102
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               57 Celsius
Temperature Sensor 2:               71 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0        102     0  0x1014  0x4004      -            0     0     -
AndersTrier commented 1 year ago

What is going on? /dev/nvme0n1p3 should have enough free space to handle a 100GB allocation. Am I missing something?

AndersTrier commented 1 year ago

Wait, LXD is using a file as backing storage? I never agreed to that.

# pvs
  PV             VG         Fmt  Attr PSize   PFree   
  /dev/loop6     default    lvm2 a--    4.65g       0 
  /dev/md127     MyVolGroup lvm2 a--  <10.92t       0 
  /dev/nvme0n1p3 vg0        lvm2 a--   <1.82t <311.59g

# losetup --list
NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE                                  DIO LOG-SEC
/dev/loop6         0      0         0  0 /var/snap/lxd/common/lxd/disks/default.img   1     512

Nevermind. I'll read the documentation and figure out how to make lxc use an existing Volume group. It would have been nice to have been asked about that during lxd init.

tomponline commented 1 year ago

This page on the LVM tab shows you how to create a pool from an existing volume group:

https://linuxcontainers.org/lxd/docs/master/howto/storage_pools/

I think you could also do this during lxd init bt specifying the volume group as the source when it asks.

AndersTrier commented 1 year ago

Hi @tomponline Thank you for all your work on LXC/LXD!

This is what I ended up doing:

lvcreate --thin --size 150GB vg0 -n lxd-thin-pool
lxc storage create nvmepool lvm source=vg0 lvm.vg.force_reuse=true lvm.thinpool_name=lxd-thin-pool

LXD assumes that it has full control over the volume group. Therefore, you should not maintain any file system entities that are not owned by LXD in an LVM volume group, because LXD might delete them. However, if you need to reuse an existing volume group (for example, because your setup has only one volume group), you can do so by setting the lvm.vg.force_reuse configuration.

https://linuxcontainers.org/lxd/docs/master/reference/storage_lvm/

I think using an existing volume group with existing LVs is useful. I get that you worry about accidentally deleting exiting LVs, but how about naming LVs managed by LXD something like: LXD_Managed_Do_Not_Touch_<container name>_<UUID>?

tomponline commented 1 year ago

Thank you for all your work on LXC/LXD!

Thanks! :)

I think using an existing volume group with existing LVs is useful.

Indeed that is why we have lvm.vg.force_reuse=true only for LVM pools because it is recognised some users have systems that can only have one volume group (perhaps pre-provisioned by an ISP).

I get that you worry about accidentally deleting exiting LVs, but how about naming LVs managed by LXD something like: LXD_Managed_Do_Not_Touch_<container name>_<UUID>

We do use somewhat unlikely to occur names such as containers_<instance_name>. However in theory at least, which ever naming scheme we use runs the risk of overlap with an existing user's volumes.

It is somewhat academic though as changing the naming scheme now would be rather complex and disruptive for existing users (potentially breaking existing workflows) and is something we would be unlikely to do.