canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 928 forks source link

LXD / LXC command hangs #5034

Closed dkruyt closed 5 years ago

dkruyt commented 5 years ago

Required information

Base information

Issue description

LXC and LXD commands just hangs, I suspect this happend after a snap refresh 2 days ago.

Steps to reproduce

just run an LXD or LXC command

root@ragnarok:~# lxc --verbose --debug list DBUG[09-14|20:24:31] Connecting to a local LXD over a Unix socket DBUG[09-14|20:24:31] Sending request to LXD etag= method=GET url=http://unix.socket/1.0 stuck forever...

Information to attach

root     49487  0.0  0.0   4504  1760 ?        Ss   19:54   0:00 /bin/sh /snap/lxd/8622/commands/daemon.start
root     49570  104  0.3 750328 56320 ?        Sl   19:54  29:37  \_ lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     49571  0.0  0.1 289768 21104 ?        Sl   19:54   0:00  \_ lxd waitready
root     49572  0.0  0.0   4504  1140 ?        S    19:54   0:00  \_ /bin/sh /snap/lxd/8622/commands/daemon.start
[5949413.852704] audit: type=1400 audit(1536635373.811:381): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.buginfo" pid=51915 comm="apparmor_parser"
[5949413.968300] audit: type=1400 audit(1536635373.927:382): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.check-kernel" pid=51917 comm="apparmor_parser"
[5949414.084889] audit: type=1400 audit(1536635374.043:383): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.daemon" pid=51919 comm="apparmor_parser"
[5949414.225903] audit: type=1400 audit(1536635374.187:384): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.hook.configure" pid=51921 comm="apparmor_parser"
[5949414.330037] audit: type=1400 audit(1536635374.291:385): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.lxd.hook.install" pid=51923 comm="apparmor_parser"
[5949414.447408] audit: type=1400 audit(1536635374.407:386): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.lxc" pid=51925 comm="apparmor_parser"
[5949414.563900] audit: type=1400 audit(1536635374.523:387): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.lxd" pid=51927 comm="apparmor_parser"
[5949414.680774] audit: type=1400 audit(1536635374.639:388): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.migrate" pid=51930 comm="apparmor_parser"
[5949414.685436] audit: type=1400 audit(1536635374.647:389): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap-update-ns.lxd" pid=51932 comm="apparmor_parser"
[6017238.331428] Process accounting resumed
[6090707.338022] audit_printk_skb: 27 callbacks suppressed
[6090707.338026] audit: type=1400 audit(1536776675.724:399): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.benchmark" pid=34651 comm="apparmor_parser"
[6090707.483615] audit: type=1400 audit(1536776675.868:400): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.buginfo" pid=34653 comm="apparmor_parser"
[6090707.612233] audit: type=1400 audit(1536776675.996:401): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.check-kernel" pid=34656 comm="apparmor_parser"
[6090707.733424] audit: type=1400 audit(1536776676.120:402): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.daemon" pid=34658 comm="apparmor_parser"
[6090707.881240] audit: type=1400 audit(1536776676.268:403): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.hook.configure" pid=34660 comm="apparmor_parser"
[6090707.991927] audit: type=1400 audit(1536776676.376:404): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.hook.install" pid=34662 comm="apparmor_parser"
[6090708.118901] audit: type=1400 audit(1536776676.504:405): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.lxc" pid=34664 comm="apparmor_parser"
[6090708.248463] audit: type=1400 audit(1536776676.636:406): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.lxd" pid=34666 comm="apparmor_parser"
[6090708.371271] audit: type=1400 audit(1536776676.756:407): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.migrate" pid=34668 comm="apparmor_parser"
[6090708.376553] audit: type=1400 audit(1536776676.764:408): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap-update-ns.lxd" pid=34670 comm="apparmor_parser"
[6090711.459695] cgroup: new mount options do not match the existing superblock, will be ignored
[6103632.862425] Process accounting resumed
[6190028.402265] Process accounting resumed
[6261671.303014] cgroup: new mount options do not match the existing superblock, will be ignored
[6263696.739084] cgroup: new mount options do not match the existing superblock, will be ignored
cat /var/snap/lxd/common/lxd/logs/lxd.log
lvl=info msg="LXD 3.4 is starting in normal mode" path=/var/snap/lxd/common/lxd t=2018-09-14T19:54:10+0200
lvl=info msg="Kernel uid/gid map:" t=2018-09-14T19:54:10+0200
lvl=info msg=" - u 0 0 4294967295" t=2018-09-14T19:54:10+0200
lvl=info msg=" - g 0 0 4294967295" t=2018-09-14T19:54:10+0200
lvl=info msg="Configured LXD uid/gid map:" t=2018-09-14T19:54:10+0200
lvl=info msg=" - u 0 1000000 1000000000" t=2018-09-14T19:54:10+0200
lvl=info msg=" - g 0 1000000 1000000000" t=2018-09-14T19:54:10+0200
lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2018-09-14T19:54:10+0200
lvl=info msg="Initializing local database" t=2018-09-14T19:54:10+0200
lvl=info msg="Initializing database gateway" t=2018-09-14T19:54:10+0200
address= id=1 lvl=info msg="Start database node" t=2018-09-14T19:54:10+0200
lvl=info msg="Raft: Restored from snapshot 1-1151664-1534795819983" t=2018-09-14T19:54:10+0200
lvl=info msg="Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]" t=2018-09-14T19:54:10+0200
lvl=info msg="Raft: Node at 0 [Leader] entering Leader state" t=2018-09-14T19:54:10+0200
lvl=info msg="Dqlite: starting event loop" t=2018-09-14T19:54:10+0200
lvl=info msg="LXD isn't socket activated" t=2018-09-14T19:54:10+0200
lvl=info msg="Starting /dev/lxd handler:" t=2018-09-14T19:54:10+0200
lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock t=2018-09-14T19:54:10+0200
lvl=info msg="REST API daemon:" t=2018-09-14T19:54:10+0200
lvl=info msg=" - binding Unix socket" socket=/var/snap/lxd/common/lxd/unix.socket t=2018-09-14T19:54:10+0200
lvl=info msg="Initializing global database" t=2018-09-14T19:54:10+0200
lvl=info msg="Dqlite: handling new connection (fd=20)" t=2018-09-14T19:54:10+0200
lvl=info msg="Dqlite: connected address=0 attempt=0" t=2018-09-14T19:54:10+0200
lvl=info msg="Initializing storage pools" t=2018-09-14T19:54:10+0200
lvl=info msg="Initializing networks" t=2018-09-14T19:54:10+0200
lvl=info msg="Pruning leftover image files" t=2018-09-14T19:54:10+0200

Systemd log (last 50 lines)

-- Logs begin at Mon 2018-09-10 08:45:12 CEST, end at Fri 2018-09-14 20:27:55 CEST. --
Sep 14 19:54:09 ragnarok lxd.daemon[4473]:   1: fd:   9: pids
Sep 14 19:54:09 ragnarok lxd.daemon[4473]:   2: fd:  10: memory
Sep 14 19:54:09 ragnarok lxd.daemon[4473]:   3: fd:  11: cpuset
Sep 14 19:54:09 ragnarok lxd.daemon[4473]:   4: fd:  12: perf_event
Sep 14 19:54:09 ragnarok lxd.daemon[4473]:   5: fd:  13: devices
Sep 14 19:54:09 ragnarok lxd.daemon[4473]:   6: fd:  14: hugetlb
Sep 14 19:54:09 ragnarok lxd.daemon[4473]:   7: fd:  15: cpu,cpuacct
Sep 14 19:54:09 ragnarok lxd.daemon[4473]:   8: fd:  16: net_cls,net_prio
Sep 14 19:54:09 ragnarok lxd.daemon[4473]:   9: fd:  17: freezer
Sep 14 19:54:09 ragnarok lxd.daemon[4473]:  10: fd:  18: name=systemd
Sep 14 19:54:09 ragnarok lxd.daemon[4473]: lxcfs.c: 105: do_reload: lxcfs: reloaded
Sep 14 19:54:09 ragnarok lxd.daemon[49487]: => Re-using existing LXCFS
Sep 14 19:54:09 ragnarok lxd.daemon[49487]: => Starting LXD
Sep 14 19:54:10 ragnarok lxd.daemon[49487]: lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2018-09-14T19:54:10+0200
Sep 14 20:17:54 ragnarok systemd[1]: Stopping Service for snap application lxd.daemon...
Sep 14 20:27:54 ragnarok systemd[1]: snap.lxd.daemon.service: Stopping timed out. Terminating.
Sep 14 20:27:54 ragnarok systemd[1]: Stopped Service for snap application lxd.daemon.
Sep 14 20:27:54 ragnarok systemd[1]: snap.lxd.daemon.service: Unit entered failed state.
Sep 14 20:27:54 ragnarok systemd[1]: snap.lxd.daemon.service: Failed with result 'timeout'.
Sep 14 20:27:55 ragnarok systemd[1]: Started Service for snap application lxd.daemon.
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: => Preparing the system
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: ==> Loading snap configuration
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: ==> Setting up mntns symlink (mnt:[4026532330])
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: ==> Setting up kmod wrapper
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: ==> Preparing /boot
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: ==> Preparing a clean copy of /run
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: ==> Preparing a clean copy of /etc
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: ==> Setting up ceph configuration
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: ==> Setting up LVM configuration
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: ==> Rotating logs
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: ==> Setting up ZFS (0.6)
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: ==> Escaping the systemd cgroups
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: ==> Escaping the systemd process resource limits
Sep 14 20:27:55 ragnarok lxd.daemon[4473]: mount namespace: 7
Sep 14 20:27:55 ragnarok lxd.daemon[4473]: hierarchies:
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   0: fd:   8: blkio
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   1: fd:   9: pids
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   2: fd:  10: memory
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   3: fd:  11: cpuset
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   4: fd:  12: perf_event
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   5: fd:  13: devices
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   6: fd:  14: hugetlb
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   7: fd:  15: cpu,cpuacct
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   8: fd:  16: net_cls,net_prio
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   9: fd:  17: freezer
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:  10: fd:  18: name=systemd
Sep 14 20:27:55 ragnarok lxd.daemon[4473]: lxcfs.c: 105: do_reload: lxcfs: reloaded
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: => Re-using existing LXCFS
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: => Killing conflicting LXD (pid=49570)
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: => Starting LXD

Detailed snap information

name:      lxd
summary:   System container manager and API
publisher: Canonical✓
contact:   https://github.com/lxc/lxd/issues
license:   unset
description: |
  LXD is a container manager for system containers.

  It offers a REST API to remotely manage containers over the network, using an image based workflow
  and with support for live migration.

  Images are available for all Ubuntu releases and architectures as well as for a wide number of
  other Linux distributions.

  LXD containers are lightweight, secure by default and a great alternative to virtual machines.
commands:
  - lxd.benchmark
  - lxd.buginfo
  - lxd.check-kernel
  - lxd.lxc
  - lxd
  - lxd.migrate
services:
  lxd.daemon: simple, enabled, inactive
snap-id:      J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking:     stable
refresh-date: 2 days ago, at 20:24 CEST
channels:
  stable:        3.4         (8622) 66MB -
  candidate:     3.5         (8636) 66MB -
  beta:          ↑
  edge:          git-a732506 (8651) 66MB -
  3.0/stable:    3.0.1       (8028) 57MB -
  3.0/candidate: 3.0.2       (8618) 65MB -
  3.0/beta:      ↑
  3.0/edge:      git-e90d8c1 (8646) 65MB -
  2.0/stable:    2.0.11      (8023) 28MB -
  2.0/candidate: 2.0.11      (8023) 28MB -
  2.0/beta:      ↑
  2.0/edge:      git-92a4fdc (8000) 26MB -
installed:       3.4         (8622) 66MB -
stgraber commented 5 years ago

Looks like you may have some kind of leftover LXD process or something going on there.

Can you do:

So we can see if there's any such leftover process and clean things up.

dkruyt commented 5 years ago

Stopping also hangs..

root@ragnarok:/snap/lxd# ps fauxww | grep lx
root      5197  0.0  0.0 897960  3100 ?        Ssl  Jul04   4:52 /usr/bin/lxcfs /var/lib/lxcfs/
root     39831  0.0  0.0  26168  1396 pts/0    S+   06:20   0:00  |                   \_ systemctl stop snap.lxd.daemon
root     40033  0.0  0.0  14196   988 pts/2    S+   06:21   0:00                      \_ grep --color=auto lx
root      6860  0.3  0.0 833520   668 ?        Sl   Jul04 323:19 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root      7048  0.0  0.0 123224   144 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers domoticz
root      7256  0.0  0.0 124632    72 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers grafana
root      7375  0.0  0.0 115028   156 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers graylog
root      7784  0.0  0.0 123224   160 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers mediadownloader
root      8490  0.0  0.0 123480   128 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers mediaserver
root      8900  0.0  0.0 124888   140 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers oxidized
root      4688  0.0  0.0 160920   792 ?        Sl   Sep10   0:00 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root     63673  0.0  0.0   4504  1708 ?        Ss   Sep14   0:00 /bin/sh /snap/lxd/8622/commands/daemon.start
root     63765  103  0.4 684920 73980 ?        Sl   Sep14 616:36  \_ lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     63766  0.0  0.1 224232 20792 ?        Sl   Sep14   0:00  \_ lxd waitready
root     63767  0.0  0.0   4504  1088 ?        S    Sep14   0:06  \_ /bin/sh /snap/lxd/8622/commands/daemon.start
lxd      63881  0.0  0.0  49984   384 ?        S    Sep14   0:00 dnsmasq --strict-order --bind-interfaces --pid-file=/var/snap/lxd/common/lxd/networks/macvlan/dnsmasq.pid --except-interface=lo --interface=macvlan --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.128.108.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/macvlan/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/macvlan/dnsmasq.hosts --dhcp-range 10.128.108.2,10.128.108.254,1h --listen-address=fd42:ef14:30d6:2e2d::1 --enable-ra --dhcp-range ::,constructor:macvlan,ra-stateless,ra-names -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/macvlan/dnsmasq.raw -u lxd
root     39833  0.2  0.0   4504   800 ?        Ss   06:20   0:00 /bin/sh /snap/lxd/8622/commands/daemon.stop
root     39857  0.0  0.0 155000 12036 ?        Sl   06:20   0:00  \_ lxc query /

So I kill the processes manual, or do I need to kill more of this list?

root@ragnarok:/snap/lxd# ps fauxww | grep lx
root      5197  0.0  0.0 897960  3100 ?        Ssl  Jul04   4:52 /usr/bin/lxcfs /var/lib/lxcfs/
root     40375  0.0  0.0  14196   900 pts/2    S+   06:23   0:00                      \_ grep --color=auto lx
root      6860  0.3  0.0 833520   668 ?        Sl   Jul04 323:19 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root      7048  0.0  0.0 123224   144 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers domoticz
root      7256  0.0  0.0 124632    72 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers grafana
root      7375  0.0  0.0 115028   156 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers graylog
root      7784  0.0  0.0 123224   160 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers mediadownloader
root      8490  0.0  0.0 123480   128 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers mediaserver
root      8900  0.0  0.0 124888   140 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers oxidized
root      4688  0.0  0.0 160920   792 ?        Sl   Sep10   0:00 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
lxd      63881  0.0  0.0  49984   384 ?        S    Sep14   0:00 dnsmasq --strict-order --bind-interfaces --pid-file=/var/snap/lxd/common/lxd/networks/macvlan/dnsmasq.pid --except-interface=lo --interface=macvlan --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.128.108.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/macvlan/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/macvlan/dnsmasq.hosts --dhcp-range 10.128.108.2,10.128.108.254,1h --listen-address=fd42:ef14:30d6:2e2d::1 --enable-ra --dhcp-range ::,constructor:macvlan,ra-stateless,ra-names -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/macvlan/dnsmasq.raw -u lxd

And then start lxd

systemctl start snap.lxd.daemon

root@ragnarok:/snap/lxd# ps fauxww | grep lx
root      5197  0.0  0.0 897960  3100 ?        Ssl  Jul04   4:52 /usr/bin/lxcfs /var/lib/lxcfs/
root     40736  0.0  0.0  14196   972 pts/2    S+   06:24   0:00                      \_ grep --color=auto lx
root      6860  0.3  0.0 833520   668 ?        Sl   Jul04 323:20 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root      7048  0.0  0.0 123224   144 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers domoticz
root      7256  0.0  0.0 124632    72 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers grafana
root      7375  0.0  0.0 115028   156 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers graylog
root      7784  0.0  0.0 123224   160 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers mediadownloader
root      8490  0.0  0.0 123480   128 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers mediaserver
root      8900  0.0  0.0 124888   140 ?        Ss   Jul04   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers oxidized
root      4688  0.0  0.0 160920   792 ?        Sl   Sep10   0:00 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root     40476  0.2  0.0   4504  1720 ?        Ss   06:24   0:00 /bin/sh /snap/lxd/8622/commands/daemon.start
root     40563 97.9  0.1 684216 29740 ?        Sl   06:24   0:14  \_ lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     40564  0.1  0.1 297964 20436 ?        Sl   06:24   0:00  \_ lxd waitready
root     40565  0.0  0.0   4504  1096 ?        S    06:24   0:00  \_ /bin/sh /snap/lxd/8622/commands/daemon.start
lxd      40676  0.0  0.0  49984   400 ?        S    06:24   0:00 dnsmasq --strict-order --bind-interfaces --pid-file=/var/snap/lxd/common/lxd/networks/macvlan/dnsmasq.pid --except-interface=lo --interface=macvlan --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.128.108.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/macvlan/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/macvlan/dnsmasq.hosts --dhcp-range 10.128.108.2,10.128.108.254,1h --listen-address=fd42:ef14:30d6:2e2d::1 --enable-ra --dhcp-range ::,constructor:macvlan,ra-stateless,ra-names -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/macvlan/dnsmasq.raw -u lxd

After that lxc ls still hangs...

Systemd log (last 50 lines)

-- Logs begin at Mon 2018-09-10 08:45:12 CEST, end at Sat 2018-09-15 06:32:00 CEST. --
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   2: fd:  10: memory
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   3: fd:  11: cpuset
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   4: fd:  12: perf_event
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   5: fd:  13: devices
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   6: fd:  14: hugetlb
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   7: fd:  15: cpu,cpuacct
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   8: fd:  16: net_cls,net_prio
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:   9: fd:  17: freezer
Sep 14 20:27:55 ragnarok lxd.daemon[4473]:  10: fd:  18: name=systemd
Sep 14 20:27:55 ragnarok lxd.daemon[4473]: lxcfs.c: 105: do_reload: lxcfs: reloaded
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: => Re-using existing LXCFS
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: => Killing conflicting LXD (pid=49570)
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: => Starting LXD
Sep 14 20:27:55 ragnarok lxd.daemon[63673]: lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2018-09-14T20:27:55+0200
Sep 15 06:20:24 ragnarok systemd[1]: Stopping Service for snap application lxd.daemon...
Sep 15 06:22:45 ragnarok lxd.daemon[39833]: => Stop reason is: crashed
Sep 15 06:22:45 ragnarok systemd[1]: Stopped Service for snap application lxd.daemon.
Sep 15 06:23:56 ragnarok systemd[1]: Stopped Service for snap application lxd.daemon.
Sep 15 06:23:59 ragnarok systemd[1]: Stopped Service for snap application lxd.daemon.
Sep 15 06:24:05 ragnarok systemd[1]: Started Service for snap application lxd.daemon.
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: => Preparing the system
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: ==> Loading snap configuration
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: ==> Setting up mntns symlink (mnt:[4026532330])
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: ==> Setting up kmod wrapper
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: ==> Preparing /boot
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: ==> Preparing a clean copy of /run
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: ==> Preparing a clean copy of /etc
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: ==> Setting up ceph configuration
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: ==> Setting up LVM configuration
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: ==> Rotating logs
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: ==> Setting up ZFS (0.6)
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: ==> Escaping the systemd cgroups
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: ==> Escaping the systemd process resource limits
Sep 15 06:24:06 ragnarok lxd.daemon[4473]: mount namespace: 7
Sep 15 06:24:06 ragnarok lxd.daemon[4473]: hierarchies:
Sep 15 06:24:06 ragnarok lxd.daemon[4473]:   0: fd:   8: blkio
Sep 15 06:24:06 ragnarok lxd.daemon[4473]:   1: fd:   9: pids
Sep 15 06:24:06 ragnarok lxd.daemon[4473]:   2: fd:  10: memory
Sep 15 06:24:06 ragnarok lxd.daemon[4473]:   3: fd:  11: cpuset
Sep 15 06:24:06 ragnarok lxd.daemon[4473]:   4: fd:  12: perf_event
Sep 15 06:24:06 ragnarok lxd.daemon[4473]:   5: fd:  13: devices
Sep 15 06:24:06 ragnarok lxd.daemon[4473]:   6: fd:  14: hugetlb
Sep 15 06:24:06 ragnarok lxd.daemon[4473]:   7: fd:  15: cpu,cpuacct
Sep 15 06:24:06 ragnarok lxd.daemon[4473]:   8: fd:  16: net_cls,net_prio
Sep 15 06:24:06 ragnarok lxd.daemon[4473]:   9: fd:  17: freezer
Sep 15 06:24:06 ragnarok lxd.daemon[4473]:  10: fd:  18: name=systemd
Sep 15 06:24:06 ragnarok lxd.daemon[4473]: lxcfs.c: 105: do_reload: lxcfs: reloaded
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: => Re-using existing LXCFS
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: => Starting LXD
Sep 15 06:24:06 ragnarok lxd.daemon[40476]: lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2018-09-15T06:24:06+0200
stgraber commented 5 years ago

Ok, can you stop it again (same way you did it earlier), then manually run:

Wait for a while and show us how far it goes.

dkruyt commented 5 years ago

Started with debug, seems to hang, I'll leave it for a while...

root@ragnarok:~# lxd --debug --group lxd
DBUG[09-15|12:48:33] Connecting to a local LXD over a Unix socket
DBUG[09-15|12:48:33] Sending request to LXD                   etag= method=GET url=http://unix.socket/1.0
INFO[09-15|12:48:33] LXD 3.4 is starting in normal mode       path=/var/snap/lxd/common/lxd
INFO[09-15|12:48:33] Kernel uid/gid map:
INFO[09-15|12:48:33]  - u 0 0 4294967295
INFO[09-15|12:48:33]  - g 0 0 4294967295
INFO[09-15|12:48:33] Configured LXD uid/gid map:
INFO[09-15|12:48:33]  - u 0 1000000 1000000000
INFO[09-15|12:48:33]  - g 0 1000000 1000000000
WARN[09-15|12:48:33] CGroup memory swap accounting is disabled, swap limits will be ignored.
INFO[09-15|12:48:33] Initializing local database
INFO[09-15|12:48:33] Initializing database gateway
INFO[09-15|12:48:33] Start database node                      address= id=1
INFO[09-15|12:48:33] Raft: Restored from snapshot 1-1151664-1534795819983
INFO[09-15|12:48:33] Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]
INFO[09-15|12:48:33] Raft: Node at 0 [Leader] entering Leader state
INFO[09-15|12:48:33] Dqlite: starting event loop
DBUG[09-15|12:48:33] Dqlite: accepting connections
INFO[09-15|12:48:33] LXD isn't socket activated
DBUG[09-15|12:48:33] Connecting to a local LXD over a Unix socket
DBUG[09-15|12:48:33] Sending request to LXD                   etag= method=GET url=http://unix.socket/1.0
DBUG[09-15|12:48:33] Detected stale unix socket, deleting
DBUG[09-15|12:48:33] Detected stale unix socket, deleting
INFO[09-15|12:48:33] Starting /dev/lxd handler:
INFO[09-15|12:48:33]  - binding devlxd socket                 socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[09-15|12:48:33] REST API daemon:
INFO[09-15|12:48:33]  - binding Unix socket                   socket=/var/snap/lxd/common/lxd/unix.socket
INFO[09-15|12:48:33] Initializing global database
INFO[09-15|12:48:33] Dqlite: handling new connection (fd=19)
INFO[09-15|12:48:33] Dqlite: connected address=0 attempt=0
INFO[09-15|12:48:33] Initializing storage pools
DBUG[09-15|12:48:33] Initializing and checking storage pool "zfs-pool01"
DBUG[09-15|12:48:33] Checking ZFS storage pool "zfs-pool01"
DBUG[09-15|12:48:33] Initializing and checking storage pool "zfs-pool02"
DBUG[09-15|12:48:33] Checking ZFS storage pool "zfs-pool02"
INFO[09-15|12:48:33] Initializing networks
DBUG[09-15|12:48:33] Connecting to a remote simplestreams server
INFO[09-15|12:48:33] Pruning leftover image files

I also noticed I got two lxcfs procs running, this should only be one right?

root@ragnarok:/snap/lxd# cat /var/snap/lxd/common/lxcfs.pid
4688
root@ragnarok:/snap/lxd# ps uax | grep lxcfs
root      4688  0.0  0.0 160920   792 ?        Sl   Sep10   0:00 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root      5197  0.0  0.0 898396  3824 ?        Ssl  Jul04   4:56 /usr/bin/lxcfs /var/lib/lxcfs/
root      6860  0.3  0.0 833524   696 ?        Sl   Jul04 324:50 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
stgraber commented 5 years ago

Ah, that's interesting if it remains stuck at "Pruning leftover image files". Can you show find /var/snap/lxd/common/lxd/images/?

The multiple lxcfs instances are fine, we added code to the snap to avoid that back in July or August, so I suspect you just have a process that pre-dates that logic and which may actually be in use by some containers (which is why we left those around back then).

dkruyt commented 5 years ago

a complete find is a little to long, to many files to post here. But there are file in there.

root@ragnarok:~# ls -la /var/snap/lxd/common/lxd/images/
total 9
drwx------  3 root root 4096 Jun 20 19:10 .
drwxr-xr-x 14 root root 4096 Sep 15 12:48 ..
drwxr-xr-x  4 root root    5 Dec  7  2016 f4dff2592a62e908c31da4a2fe85f67af044dda46357d2dc26680d1d14564db1.zfs
root@ragnarok:~# ls -la /var/snap/lxd/common/lxd/images/f4dff2592a62e908c31da4a2fe85f67af044dda46357d2dc26680d1d14564db1.zfs
total 35
drwxr-xr-x  4 root root    5 Dec  7  2016 .
drwx------  3 root root 4096 Jun 20 19:10 ..
-rw-r--r--  1 root root 1566 Dec  6  2016 metadata.yaml
drwxr-xr-x 22 root root   22 Dec  7  2016 rootfs
drwxr-xr-x  2 root root    8 Dec  6  2016 templates
root@ragnarok:~# cat /var/snap/lxd/common/lxd/images/f4dff2592a62e908c31da4a2fe85f67af044dda46357d2dc26680d1d14564db1.zfs/metadata.yaml
architecture: "x86_64"
creation_date: 1480986364
properties:
    architecture: "x86_64"
    description: "Ubuntu 16.04 LTS server (20161205)"
    os: "ubuntu"
    release: "xenial"
templates:
    /etc/hostname:
        when:
            - create
            - copy
        template: hostname.tpl
    /var/lib/cloud/seed/nocloud-net/meta-data:
        when:
            - create
            - copy
        template: cloud-init-meta.tpl
    /var/lib/cloud/seed/nocloud-net/network-config:
        when:
            - create
            - copy
        template: cloud-init-network.tpl
    /var/lib/cloud/seed/nocloud-net/user-data:
        when:
            - create
            - copy
        template: cloud-init-user.tpl
        properties:
            default: |
                #cloud-config
                {}
    /var/lib/cloud/seed/nocloud-net/vendor-data:
        when:
            - create
            - copy
        template: cloud-init-vendor.tpl
        properties:
            default: |
                #cloud-config
                {}
    /etc/init/console.override:
        when:
            - create
        template: upstart-override.tpl
    /etc/init/tty1.override:
        when:
            - create
        template: upstart-override.tpl
    /etc/init/tty2.override:
        when:
            - create
        template: upstart-override.tpl
    /etc/init/tty3.override:
        when:
            - create
        template: upstart-override.tpl
    /etc/init/tty4.override:
        when:
            - create
        template: upstart-override.tpl

Did a amount of /var/snap/lxd/common/lxd/images/f4dff2592a62e908c31da4a2fe85f67af044dda46357d2dc26680d1d14564db1.zfs now lxd is starting.

stgraber commented 5 years ago

Oh, so your images directory looks very very wrong. There should only be files in there, no directories at all. The fact that you do have a very large directory is most likely a ZFS setup issue.

I'd check with zfs list -t all that none of your images have a mountpoint set as they very much should not. Then I'd just remove anything that's not a file from that directory to go back to sanity.

stgraber commented 5 years ago

Do you maybe remember having manually set a ZFS dataset to be on /var/snap/lxd/common/lxd/images? That would possibly explain what you're seeing and although not a recommended setup, it's something we'd probably look into fixing if that's indeed the source of the issue.

The zfs list -t all output should let us figure this out either way.

stgraber commented 5 years ago

@dkruyt

dkruyt commented 5 years ago

Do you maybe remember having manually set a ZFS dataset to be on /var/snap/lxd/common/lxd/images?

No I dont remember, but I had problems with upgrading to 3.0 and move from deb to snap lxd packages. Maybe that caused it.

stgraber commented 5 years ago

@dkruyt can you show zfs list -t all?

dkruyt commented 5 years ago
NAME                                                                                                    USED  AVAIL  REFER  MOUNTPOINT
zfs-pool01                                                                                             7.03T   636G   163K  /var/lib/snapd/hostfs/zfs-pool01
zfs-pool01/Backup                                                                                       842G   636G   842G  /zfs-pool01/Backup
zfs-pool01/Containers                                                                                  85.1G   636G  85.1G  /var/lib/snapd/hostfs/zfs-pool01/Containers
zfs-pool01/TBD                                                                                          436G   636G   436G  /zfs-pool01/TBD
zfs-pool01/TBD@2018-09-16-000000                                                                           0      -   436G  -
zfs-pool01/TBD@2018-09-17-000000                                                                           0      -   436G  -
zfs-pool01/TBD@2018-09-18-000000                                                                           0      -   436G  -
zfs-pool01/TBD@2018-09-19-000000                                                                           0      -   436G  -
zfs-pool01/TBD@2018-09-20-000000                                                                           0      -   436G  -
zfs-pool01/TBD@2018-09-21-000000                                                                           0      -   436G  -
zfs-pool01/TBD@2018-09-22-000000                                                                           0      -   436G  -
zfs-pool01/TBD@2018-09-23-000000                                                                           0      -   436G  -
zfs-pool01/TimeMachine                                                                                 1.07T   636G  1.07T  /zfs-pool01/TimeMachine
zfs-pool01/Videos                                                                                       678M   636G   678M  /zfs-pool01/Videos
zfs-pool01/VirtualMachines                                                                             14.3G   636G  14.3G  /var/lib/snapd/hostfs/zfs-pool01/VirtualMachines
zfs-pool01/containers                                                                                  33.2G   636G   140K  /var/lib/snapd/hostfs/zfs-pool01/containers
zfs-pool01/containers/amazon-alexa                                                                      567M   636G  1.13G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/amazon-alexa
zfs-pool01/containers/degiro                                                                            345M   636G  1022M  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/degiro
zfs-pool01/containers/domoticz                                                                         1.41G   636G  1.31G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/domoticz
zfs-pool01/containers/domoticz@copy-7d0dc946-e50b-44e4-8e89-b3c491ec97e7                               84.5M      -   871M  -
zfs-pool01/containers/domoticz-grafana                                                                 1.59G   636G  1.82G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/domoticz-grafana
zfs-pool01/containers/domoticz-tmp                                                                      118M   636G   833M  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/domoticz-tmp
zfs-pool01/containers/emby                                                                              440M   636G  1.06G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/emby
zfs-pool01/containers/ftp                                                                              1.65G   636G  1.05G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/ftp
zfs-pool01/containers/ghostblog                                                                         581M   636G   581M  /var/snap/lxd/common/lxd/storage-pools/lxd/containers/ghostblog
zfs-pool01/containers/ghostblog@snapshot-2017-06-26                                                        0      -   581M  -
zfs-pool01/containers/gpx2influx                                                                        860M   636G  1.51G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/gpx2influx
zfs-pool01/containers/hassio                                                                           6.33G   636G  7.00G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/hassio
zfs-pool01/containers/home-assistant                                                                   1.55G   636G  1.84G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/home-assistant
zfs-pool01/containers/jmeter                                                                            698M   636G  1.29G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/jmeter
zfs-pool01/containers/k6                                                                                175M   636G   867M  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/k6
zfs-pool01/containers/mediadownloader                                                                  2.30G   636G  1.85G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/mediadownloader
zfs-pool01/containers/mediaserver                                                                      8.18G   636G  6.09G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/mediaserver
zfs-pool01/containers/oxidized                                                                         1.10G   636G   978M  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/oxidized
zfs-pool01/containers/snort                                                                            1.23G   636G  1.23G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/snort
zfs-pool01/containers/unms                                                                             4.16G   636G  4.50G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool01/containers/unms
zfs-pool01/deleted                                                                                     9.40G   636G   140K  /var/lib/snapd/hostfs/zfs-pool01/deleted
zfs-pool01/deleted/images                                                                              9.40G   636G   140K  /var/lib/snapd/hostfs/zfs-pool01/deleted/images
zfs-pool01/deleted/images/11fc1b1d39b9f9cd7e9491871f1421ac4278e1d599ecf5d180f2a6e2483bd172              802M   636G   802M  none
zfs-pool01/deleted/images/11fc1b1d39b9f9cd7e9491871f1421ac4278e1d599ecf5d180f2a6e2483bd172@readonly        0      -   802M  -
zfs-pool01/deleted/images/295d53ae6db4b3b0df9565377cb48c62845ab3bee5a241a8effe50ccdfd8bf17              771M   636G   771M  none
zfs-pool01/deleted/images/295d53ae6db4b3b0df9565377cb48c62845ab3bee5a241a8effe50ccdfd8bf17@readonly        0      -   771M  -
zfs-pool01/deleted/images/37a7b2fc60a834b0249df4af35e470c8409c51d48b5b097d32d5ca07a86ccda6              709M   636G   709M  none
zfs-pool01/deleted/images/37a7b2fc60a834b0249df4af35e470c8409c51d48b5b097d32d5ca07a86ccda6@readonly        0      -   709M  -
zfs-pool01/deleted/images/5d335d31c6ef7d9a860dc0a10c89bddcdde46e0553164e444dff8020a83d56e3              763M   636G   763M  none
zfs-pool01/deleted/images/5d335d31c6ef7d9a860dc0a10c89bddcdde46e0553164e444dff8020a83d56e3@readonly        0      -   763M  -
zfs-pool01/deleted/images/61d54418874f2f84e24ddd6934b3bb759ca76cbc49820da7d34f8b5b778e4816              709M   636G   709M  none
zfs-pool01/deleted/images/61d54418874f2f84e24ddd6934b3bb759ca76cbc49820da7d34f8b5b778e4816@readonly        0      -   709M  -
zfs-pool01/deleted/images/725b9d539a4aef7d4405b09e831f741b9f8ef38caa067fffadf479e44b8bd22a              710M   636G   710M  none
zfs-pool01/deleted/images/725b9d539a4aef7d4405b09e831f741b9f8ef38caa067fffadf479e44b8bd22a@readonly        0      -   710M  -
zfs-pool01/deleted/images/8fc2e3ec4809222eb30e0dc3706d0dae1c01284a5897c58dc7046af84bed0c4f              709M   636G   709M  none
zfs-pool01/deleted/images/8fc2e3ec4809222eb30e0dc3706d0dae1c01284a5897c58dc7046af84bed0c4f@readonly        0      -   709M  -
zfs-pool01/deleted/images/b36ec647e374da4816104a98807633a2cc387488083d3776557081c4d0333618              761M   636G   761M  none
zfs-pool01/deleted/images/b36ec647e374da4816104a98807633a2cc387488083d3776557081c4d0333618@readonly        0      -   761M  -
zfs-pool01/deleted/images/c23b37d4885c51f3734117e37a489eec7fbb53db3702b534d956431d13d2d8fd              711M   636G   711M  none
zfs-pool01/deleted/images/c23b37d4885c51f3734117e37a489eec7fbb53db3702b534d956431d13d2d8fd@readonly        0      -   711M  -
zfs-pool01/deleted/images/cd6c6eb9a79de3d7af9163f6ba439844572725829ab059a9bc5f78595fe8b288              782M   636G   782M  none
zfs-pool01/deleted/images/cd6c6eb9a79de3d7af9163f6ba439844572725829ab059a9bc5f78595fe8b288@readonly        0      -   782M  -
zfs-pool01/deleted/images/da5746c04d9302a6804eda5b6fec4b654b78ae8a7adce9e4d1b6e71e4cc5a6b8              710M   636G   710M  none
zfs-pool01/deleted/images/da5746c04d9302a6804eda5b6fec4b654b78ae8a7adce9e4d1b6e71e4cc5a6b8@readonly        0      -   710M  -
zfs-pool01/deleted/images/f4c9feb3e4018ffbd793d4b80d54fb95835bb1461380f2f9b7976c7b10ac48b9              782M   636G   782M  none
zfs-pool01/deleted/images/f4c9feb3e4018ffbd793d4b80d54fb95835bb1461380f2f9b7976c7b10ac48b9@readonly        0      -   782M  -
zfs-pool01/deleted/images/f4eba5df5f88129cc68d969ae0f3762869ec79c9abc8344a74bb5c932d70b2ad              707M   636G   707M  none
zfs-pool01/deleted/images/f4eba5df5f88129cc68d969ae0f3762869ec79c9abc8344a74bb5c932d70b2ad@readonly        0      -   707M  -
zfs-pool01/images                                                                                      2.63G   636G   174K  /var/lib/snapd/hostfs/zfs-pool01/images
zfs-pool01/images/1e59027d1d58873fc7e23f769232cd4846c6d675ea292bc71037fafc1547649d                      697M   636G   697M  none
zfs-pool01/images/1e59027d1d58873fc7e23f769232cd4846c6d675ea292bc71037fafc1547649d@readonly                0      -   697M  -
zfs-pool01/images/270d4baa50c61d7dad71e49f9670fa48b35d5c7dc2b0226709c7fc4abc1bb3d4                      765M   636G   765M  none
zfs-pool01/images/270d4baa50c61d7dad71e49f9670fa48b35d5c7dc2b0226709c7fc4abc1bb3d4@readonly                0      -   765M  -
zfs-pool01/images/b555330e62a470b1638c79190000a72dd4b0bb4809628448f704a984be69a607                      444M   636G   444M  none
zfs-pool01/images/b555330e62a470b1638c79190000a72dd4b0bb4809628448f704a984be69a607@readonly                0      -   444M  -
zfs-pool01/images/f4dff2592a62e908c31da4a2fe85f67af044dda46357d2dc26680d1d14564db1                      782M   636G   782M  /var/snap/lxd/common/lxd/images/f4dff2592a62e908c31da4a2fe85f67af044dda46357d2dc26680d1d14564db1.zfs
zfs-pool01/images/f4dff2592a62e908c31da4a2fe85f67af044dda46357d2dc26680d1d14564db1@readonly                0      -   782M  -
zfs-pool01/test123                                                                                      140K   636G   140K  /zfs-pool01/test123
zfs-pool01/work-storage                                                                                 414G   636G   414G  /zfs-pool01/work-storage
zfs-pool01/work-storage@2018-09-16-000000                                                                  0      -   414G  -
zfs-pool01/work-storage@2018-09-17-000000                                                                  0      -   414G  -
zfs-pool01/work-storage@2018-09-18-000000                                                                  0      -   414G  -
zfs-pool01/work-storage@2018-09-19-000000                                                                  0      -   414G  -
zfs-pool01/work-storage@2018-09-20-000000                                                                  0      -   414G  -
zfs-pool01/work-storage@2018-09-21-000000                                                                  0      -   414G  -
zfs-pool01/work-storage@2018-09-22-000000                                                                  0      -   414G  -
zfs-pool01/work-storage@2018-09-23-000000                                                                  0      -   414G  -
zfs-pool02                                                                                             10.6G  37.6G    19K  none
zfs-pool02/containers                                                                                  9.12G  37.6G    19K  none
zfs-pool02/containers/elk                                                                              1001M  37.6G  1.23G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool02/containers/elk
zfs-pool02/containers/grafana                                                                          1.32G  37.6G  1.45G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool02/containers/grafana
zfs-pool02/containers/graylog                                                                          4.44G  37.6G  4.54G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool02/containers/graylog
zfs-pool02/containers/icinga                                                                            357M  37.6G   560M  /var/snap/lxd/common/lxd/storage-pools/zfs-pool02/containers/icinga
zfs-pool02/containers/librenms                                                                          719M  37.6G   847M  /var/snap/lxd/common/lxd/storage-pools/zfs-pool02/containers/librenms
zfs-pool02/containers/zabbix                                                                           1.34G  37.6G  1.57G  /var/snap/lxd/common/lxd/storage-pools/zfs-pool02/containers/zabbix
zfs-pool02/custom                                                                                        19K  37.6G    19K  none
zfs-pool02/deleted                                                                                     1.40G  37.6G    19K  none
zfs-pool02/deleted/images                                                                              1.40G  37.6G    19K  none
zfs-pool02/deleted/images/2b0eacdb15eaec2bd7861ec0f95477b37521f971804ec3381d8f7e77087461e0              523M  37.6G   523M  none
zfs-pool02/deleted/images/2b0eacdb15eaec2bd7861ec0f95477b37521f971804ec3381d8f7e77087461e0@readonly        0      -   523M  -
zfs-pool02/deleted/images/61d54418874f2f84e24ddd6934b3bb759ca76cbc49820da7d34f8b5b778e4816              305M  37.6G   305M  none
zfs-pool02/deleted/images/61d54418874f2f84e24ddd6934b3bb759ca76cbc49820da7d34f8b5b778e4816@readonly        0      -   305M  -
zfs-pool02/deleted/images/8fc2e3ec4809222eb30e0dc3706d0dae1c01284a5897c58dc7046af84bed0c4f              305M  37.6G   305M  none
zfs-pool02/deleted/images/8fc2e3ec4809222eb30e0dc3706d0dae1c01284a5897c58dc7046af84bed0c4f@readonly        0      -   305M  -
zfs-pool02/deleted/images/c5bbef7f4e1c19f0104fd49b862b2e549095d894765c75c6d72775f1d98185ec              304M  37.6G   304M  none
zfs-pool02/deleted/images/c5bbef7f4e1c19f0104fd49b862b2e549095d894765c75c6d72775f1d98185ec@readonly        0      -   304M  -
zfs-pool02/images                                                                                        19K  37.6G    19K  none
zfs-pool02/snapshots                                                                                     19K  37.6G    19K  none
stgraber commented 5 years ago

@dkruyt thanks, looking into this now

stgraber commented 5 years ago

Hmm, yeah, that zpool isn't configured in a way that LXD will work well with.

So first issue is that the image indeed has a mountpoint for some reason, this can be fixed with:

Then there is the bigger issue that you seem to have mixed LXD and non-LXD data on the same zpool, this is unsupported by LXD. LXD does expect to be the only user of whatever dataset it's provided. Normally for a setup such as yours, you'd have told LXD to use "zfs-pool02/lxd" and so everything would have then lived under that. This however cannot easily be retrofit.

In your current setup, you best change to restore some sanity is to do:

The other pool (zfs-pool02) appears to only be managed by LXD and its mountpoints all look correct.

Closing as this issue was caused due to ZPOOL misconfiguration rather than a LXD issue. LXD does check that the ZPOOL it's provided at setup time is empty and will refuse to setup the pool if it's not. In this case I suspect the configuration changes and extra datasets were added afterwards.