canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.39k stars 929 forks source link

Networking does not work in fresh Bionic container #4510

Closed Kramerican closed 6 years ago

Kramerican commented 6 years ago

Tried with LXD v2.21 on Ubuntu 16.04 and LXD v3.0.0 on 18.04 (system upgraded from 16.04)

Networking does not come up and container does not get an Ip assigned on my network bridge.

On both my 16.04 and 18.04 host system, a xenial image comes up just fine.

I have tried provisioning from ubuntu:bionicas well as images:ubuntu/bionic/amd64 with identical results.

/var/log/syslog on the host shows in all cases lines similar to

Apr 29 20:25:15 krellide kernel: [6056886.886248] audit: type=1400 audit(1525026315.592:23530): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-bionic-template-xlemp72_</var/lib/lxd>" name="/sys/fs/cgroup/unified/" pid=19042 comm="systemd" fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
Apr 29 20:25:15 krellide kernel: [6056886.886297] audit: type=1400 audit(1525026315.592:23531): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-bionic-template-xlemp72_</var/lib/lxd>" name="/sys/fs/cgroup/unified/" pid=19042 comm="systemd" fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
Apr 29 20:25:16 krellide kernel: [6056887.323323] audit: type=1400 audit(1525026316.029:23532): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-bionic-template-xlemp72_</var/lib/lxd>" name="/run/systemd/unit-root/var/lib/lxcfs/" pid=19482 comm="(networkd)" flags="ro, nosuid, nodev, remount, bind"

These lines are not present in syslog when provisioning other versions of Ubuntu (Xenial/Zesty). Interestingly upgrading an existing Xenial container to Bionic does not cause any networking issues.

Without knowing much about apparmor, I am assuming that the DENIED ... networkd line is an indicator of the culprit here. Any assistance would be much appreciated :)

stgraber commented 6 years ago

Ok, I wonder if the snap is trying to still spawn LXD in the background, messing with the manually started one, explaining the socket issue.

Can you:

That should have it start again, likely still taking a couple minutes at the database stage, wait for it to start for real and see if the socket works any better now.

stgraber commented 6 years ago

No lxd.db is normal for LXD 3.0

davidfavor commented 6 years ago

Same behavior.

net10 # systemctl stop snap.lxd.daemon
net10 # echo $?
0
net10 # rm -fv /var/snap/lxd/common/lxd/unix.socket
net10 # lxd --debug --group lxd
INFO[05-01|15:03:22] LXD 3.0.0 is starting in normal mode     path=/var/snap/lxd/common/lxd
INFO[05-01|15:03:22] Kernel uid/gid map: 
INFO[05-01|15:03:22]  - u 0 0 4294967295 
INFO[05-01|15:03:22]  - g 0 0 4294967295 
INFO[05-01|15:03:22] Configured LXD uid/gid map: 
INFO[05-01|15:03:22]  - u 0 1000000 1000000000 
INFO[05-01|15:03:22]  - g 0 1000000 1000000000 
WARN[05-01|15:03:22] CGroup memory swap accounting is disabled, swap limits will be ignored. 
INFO[05-01|15:03:22] Initializing local database 
INFO[05-01|15:03:22] Initializing database gateway 
INFO[05-01|15:03:22] Start database node                      address= id=1
INFO[05-01|15:03:22] Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}] 
INFO[05-01|15:03:22] Raft: Node at 0 [Leader] entering Leader state 
INFO[05-01|15:03:22] LXD isn't socket activated 
INFO[05-01|15:03:22] Starting /dev/lxd handler: 
INFO[05-01|15:03:22]  - binding devlxd socket                 socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[05-01|15:03:22] REST API daemon: 
INFO[05-01|15:03:22]  - binding Unix socket                   socket=/var/snap/lxd/common/lxd/unix.socket
INFO[05-01|15:03:22]  - binding TCP socket                    socket=[::]:8443
INFO[05-01|15:03:22] Initializing global database 
DBUG[05-01|15:04:22] Database error: failed to begin transaction: gRPC BEGIN response error: rpc error: code = Unknown desc = failed to handle BEGIN request: FSM out of sync: timed out enqueuing operation 
EROR[05-01|15:04:22] Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to begin transaction: gRPC BEGIN response error: rpc error: code = Unknown desc = failed to handle BEGIN request: FSM out of sync: timed out enqueuing operation 
INFO[05-01|15:04:22] Starting shutdown sequence 
INFO[05-01|15:04:22] Stopping REST API handler: 
INFO[05-01|15:04:22]  - closing socket                        socket=[::]:8443
INFO[05-01|15:04:22]  - closing socket                        socket=/var/snap/lxd/common/lxd/unix.socket
INFO[05-01|15:04:22] Stopping /dev/lxd handler 
INFO[05-01|15:04:22]  - closing socket                        socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[05-01|15:04:22] Stop database gateway 
INFO[05-01|15:04:22] Stop raft instance 
INFO[05-01|15:04:22] Stopping REST API handler: 
INFO[05-01|15:04:22] Stopping /dev/lxd handler 
INFO[05-01|15:04:22] Stopping REST API handler: 
INFO[05-01|15:04:22] Stopping /dev/lxd handler 
DBUG[05-01|15:04:22] Not unmounting temporary filesystems (containers are still running) 
INFO[05-01|15:04:22] Saving simplestreams cache 
INFO[05-01|15:04:22] Saved simplestreams cache 
Error: failed to open cluster database: failed to ensure schema: failed to begin transaction: gRPC BEGIN response error: rpc error: code = Unknown desc = failed to handle BEGIN request: FSM out of sync: timed out enqueuing operation
net10 # ll /var/snap/lxd/common/lxd/unix.socket
/bin/ls: cannot access '/var/snap/lxd/common/lxd/unix.socket': No such file or directory
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          2  0.0  0.0      0     0 ?        S    09:28   0:00 [kthreadd]
root          3  0.0  0.0      0     0 ?        R    09:28   0:01  \_ [kworker/0:0]
root          4  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/0:0H]
root          6  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [mm_percpu_wq]
root          7  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/0]
root          8  0.0  0.0      0     0 ?        I    09:28   0:01  \_ [rcu_sched]
root          9  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [rcu_bh]
root         10  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/0]
root         11  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/0]
root         12  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/0]
root         13  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/1]
root         14  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/1]
root         15  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/1]
root         16  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/1]
root         18  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/1:0H]
root         19  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/2]
root         20  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/2]
root         21  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/2]
root         22  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/2]
root         24  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/2:0H]
root         25  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/3]
root         26  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/3]
root         27  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/3]
root         28  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/3]
root         30  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/3:0H]
root         31  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/4]
root         32  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/4]
root         33  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/4]
root         34  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/4]
root         36  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/4:0H]
root         37  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/5]
root         38  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/5]
root         39  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/5]
root         40  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/5]
root         42  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/5:0H]
root         43  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/6]
root         44  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/6]
root         45  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/6]
root         46  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/6]
root         48  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/6:0H]
root         49  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/7]
root         50  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/7]
root         51  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/7]
root         52  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/7]
root         54  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/7:0H]
root         55  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/8]
root         56  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/8]
root         57  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/8]
root         58  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/8]
root         59  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/8:0]
root         60  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/8:0H]
root         61  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/9]
root         62  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/9]
root         63  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/9]
root         64  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/9]
root         66  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/9:0H]
root         67  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/10]
root         68  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/10]
root         69  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/10]
root         70  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/10]
root         71  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/10:0]
root         72  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/10:0H]
root         73  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/11]
root         74  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/11]
root         75  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/11]
root         76  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/11]
root         78  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/11:0H]
root         79  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/12]
root         80  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/12]
root         81  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/12]
root         82  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/12]
root         84  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/12:0H]
root         85  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/13]
root         86  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/13]
root         87  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/13]
root         88  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/13]
root         90  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/13:0H]
root         91  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/14]
root         92  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/14]
root         93  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/14]
root         94  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/14]
root         95  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/14:0]
root         96  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/14:0H]
root         97  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [cpuhp/15]
root         98  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [watchdog/15]
root         99  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [migration/15]
root        100  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ksoftirqd/15]
root        102  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/15:0H]
root        103  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [kdevtmpfs]
root        104  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [netns]
root        105  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [rcu_tasks_kthre]
root        106  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [kauditd]
root        107  0.0  0.0      0     0 ?        I    09:28   0:01  \_ [kworker/0:1]
root        108  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/1:1]
root        109  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/2:1]
root        110  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [khungtaskd]
root        111  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [oom_reaper]
root        112  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [writeback]
root        113  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [kcompactd0]
root        114  0.0  0.0      0     0 ?        SN   09:28   0:00  \_ [ksmd]
root        115  0.0  0.0      0     0 ?        SN   09:28   0:00  \_ [khugepaged]
root        116  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [crypto]
root        117  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kintegrityd]
root        118  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kblockd]
root        119  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [ata_sff]
root        120  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [md]
root        121  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [edac-poller]
root        122  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [devfreq_wq]
root        123  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [watchdogd]
root        125  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/3:1]
root        126  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/4:1]
root        127  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/5:1]
root        128  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/6:1]
root        129  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/7:1]
root        130  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/9:1]
root        131  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/10:1]
root        132  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/11:1]
root        133  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/8:1]
root        134  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/12:1]
root        135  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/14:1]
root        136  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/15:1]
root        137  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/13:1]
root        139  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [kswapd0]
root        140  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [ecryptfs-kthrea]
root        182  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kthrotld]
root        184  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [acpi_thermal_pm]
root        185  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/4:2]
root        187  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/1:2]
root        189  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/3:2]
root        190  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/15:2]
root        191  0.0  0.0      0     0 ?        I    09:28   0:00  \_ [kworker/2:2]
root        195  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [ipv6_addrconf]
root        204  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kstrp]
root        221  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [charger_manager]
root        277  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [scsi_eh_0]
root        278  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [scsi_tmf_0]
root        279  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [scsi_eh_1]
root        280  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [scsi_tmf_1]
root        281  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [scsi_eh_2]
root        282  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [scsi_tmf_2]
root        283  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [scsi_eh_3]
root        284  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [scsi_tmf_3]
root        285  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [scsi_eh_4]
root        286  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [scsi_tmf_4]
root        287  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [scsi_eh_5]
root        288  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [scsi_tmf_5]
root        305  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/5:1H]
root        312  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [md4_raid1]
root        313  0.0  0.0      0     0 ?        S    09:28   0:00  \_ [md2_raid1]
root        316  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/13:1H]
root        317  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/7:1H]
root        318  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [ixgbe]
root        319  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [ttm_swap]
root        321  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/9:1H]
root        322  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/1:1H]
root        323  0.0  0.0      0     0 ?        I<   09:28   0:00  \_ [kworker/11:1H]
root        356  0.0  0.0      0     0 ?        I<   09:30   0:00  \_ [raid5wq]
root        402  0.0  0.0      0     0 ?        I<   09:30   0:00  \_ [kworker/6:1H]
root        406  0.1  0.0      0     0 ?        S    09:30   0:02  \_ [jbd2/md4-8]
root        407  0.0  0.0      0     0 ?        I<   09:30   0:00  \_ [ext4-rsv-conver]
root        424  0.0  0.0      0     0 ?        I<   09:30   0:00  \_ [kworker/14:1H]
root        447  0.0  0.0      0     0 ?        I<   09:30   0:01  \_ [kworker/12:1H]
root        467  0.0  0.0      0     0 ?        I    09:30   0:00  \_ [kworker/9:2]
root        472  0.0  0.0      0     0 ?        I<   09:30   0:00  \_ [kworker/10:1H]
root        479  0.0  0.0      0     0 ?        I    09:30   0:00  \_ [kworker/5:2]
root        566  0.0  0.0      0     0 ?        I<   09:30   0:00  \_ [kworker/8:1H]
root        640  0.0  0.0      0     0 ?        SN   09:30   0:00  \_ [kipmi0]
root        642  0.0  0.0      0     0 ?        S<   09:30   0:00  \_ [loop0]
root        643  0.0  0.0      0     0 ?        I<   09:30   0:00  \_ [kworker/2:1H]
root        645  0.0  0.0      0     0 ?        I<   09:30   0:00  \_ [kworker/0:1H]
root        648  0.0  0.0      0     0 ?        I<   09:30   0:00  \_ [kworker/15:1H]
root        649  0.0  0.0      0     0 ?        I<   09:30   0:00  \_ [kworker/4:1H]
root        656  0.0  0.0      0     0 ?        I<   09:30   0:00  \_ [kworker/3:1H]
root        738  0.0  0.0      0     0 ?        S<   09:30   0:00  \_ [loop1]
root        740  0.0  0.0      0     0 ?        S<   09:30   0:00  \_ [loop2]
root        766  0.0  0.0      0     0 ?        S<   09:30   0:00  \_ [loop3]
root        788  0.0  0.0      0     0 ?        S<   09:30   0:00  \_ [loop4]
root        814  0.0  0.0      0     0 ?        S<   09:30   0:00  \_ [loop5]
root        840  0.0  0.0      0     0 ?        I    09:30   0:00  \_ [kworker/7:2]
root        843  0.0  0.0      0     0 ?        S    09:30   0:00  \_ [jbd2/md2-8]
root        844  0.0  0.0      0     0 ?        I<   09:30   0:00  \_ [ext4-rsv-conver]
root        846  0.0  0.0      0     0 ?        I    09:31   0:00  \_ [kworker/13:2]
root       6945  0.0  0.0      0     0 ?        I    09:40   0:00  \_ [kworker/6:0]
root       7305  0.0  0.0      0     0 ?        I    09:40   0:00  \_ [kworker/11:2]
root       7566  0.0  0.0      0     0 ?        I    09:42   0:00  \_ [kworker/u32:1]
root       8797  0.0  0.0      0     0 ?        I    09:50   0:00  \_ [kworker/u32:2]
root       9746  0.0  0.0      0     0 ?        I    09:55   0:00  \_ [kworker/u32:3]
root      10138  0.0  0.0      0     0 ?        I    09:58   0:00  \_ [kworker/12:0]
root      10889  0.0  0.0      0     0 ?        I    10:04   0:00  \_ [kworker/u32:0]
root          1  2.9  0.0  78068  9144 ?        Ss   09:28   1:06 /sbin/init noquiet nosplash
root        465  0.0  0.0 168504 48736 ?        S<s  09:30   0:00 /lib/systemd/systemd-journald
root        469  0.0  0.0  97708  1832 ?        Ss   09:30   0:00 /sbin/lvmetad -f
root        480  0.0  0.0  46848  5700 ?        Ss   09:30   0:00 /lib/systemd/systemd-udevd
root        820  0.0  0.0   7488  2192 ?        Ss   09:30   0:00 /sbin/mdadm --monitor --scan
systemd+    823  0.0  0.0  71936  5868 ?        Ss   09:30   0:00 /lib/systemd/systemd-networkd
systemd+    858  0.0  0.0  70608  5312 ?        Ss   09:31   0:00 /lib/systemd/systemd-resolved
systemd+    859  0.0  0.0 141908  3264 ?        Ssl  09:31   0:00 /lib/systemd/systemd-timesyncd
root        990  0.0  0.0  26720  5260 ?        Ss   09:31   0:00 /usr/sbin/smartd -n
syslog      991  0.0  0.0 263032  4504 ?        Ssl  09:31   0:00 /usr/sbin/rsyslogd -n
root        996  0.0  0.0  61996  5528 ?        Ss   09:31   0:00 /lib/systemd/systemd-logind
root        998  0.0  0.0 287508  6660 ?        Ssl  09:31   0:00 /usr/lib/accountsservice/accounts-daemon
message+   1002  0.0  0.0  49928  4300 ?        Ss   09:31   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root       1012  0.0  0.0   9916   124 ?        Ss   09:31   0:00 /usr/sbin/rngd -r /dev/hwrng
root       1015  0.0  0.0 110556  3484 ?        Ssl  09:31   0:00 /usr/sbin/irqbalance --foreground
root       1017  0.0  0.0  31320  3200 ?        Ss   09:31   0:00 /usr/sbin/cron -f
root       1018  0.0  0.0 170372 17048 ?        Ssl  09:31   0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher
root       1019  0.0  0.0 2096096 17768 ?       Ssl  09:31   0:00 /usr/lib/snapd/snapd
root       1026  0.0  0.0  72296  6324 ?        Ss   09:31   0:00 /usr/sbin/sshd -D
root       4521  0.0  0.0  74668  6496 ?        Ss   09:36   0:00  \_ sshd: root@pts/1
root       4523  0.0  0.0  19944  4860 pts/1    Ss+  09:36   0:00  |   \_ -bash
root       8468  0.0  0.0  74668  6508 ?        Ss   09:49   0:00  \_ sshd: root@pts/0
root       8470  0.0  0.0  19812  4672 pts/0    Ss   09:49   0:00      \_ -bash
root      11076  0.0  0.0  34712  3256 pts/0    R+   10:05   0:00          \_ ps fauxww
root       1039  0.0  0.0 301024 20648 ?        Ssl  09:31   0:01 /usr/bin/python3 /usr/bin/fail2ban-server -xf start
root       1059  0.1  0.0  29524  8292 ?        S    09:31   0:02 perl /david-favor/tools/route-reviver
root       1190  0.0  0.0  15956  2268 ttyS1    Ss+  09:31   0:00 /sbin/agetty -o -p -- \u --keep-baud 115200,38400,9600 ttyS1 vt220
root       1191  0.0  0.0  15956  2384 ttyS0    Ss+  09:31   0:00 /sbin/agetty -o -p -- \u --keep-baud 115200,38400,9600 ttyS0 vt220
root       1194  0.0  0.0  16180  1992 tty1     Ss+  09:31   0:01 /sbin/agetty -o -p -- \u --noclear tty1 linux
root       1211  0.0  0.0 160920  1236 ?        Sl   09:31   0:00 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
nobody     8801  0.5  0.0  49984   404 ?        S    09:50   0:04 dnsmasq --strict-order --bind-interfaces --pid-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.pid --except-interface=lo --interface=lxdbr0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.245.137.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.245.137.2,10.245.137.254,1h --listen-address=fd42:3e36:490c:fa3e::1 --enable-ra --dhcp-range ::,constructor:lxdbr0,ra-stateless,ra-names -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u nobody
davidfavor commented 6 years ago

So it looks like all that's left to get fixed is to generate /var/snap/lxd/common/lxd/lxd.db + maybe all will be well.

stgraber commented 6 years ago

no, you have a database, as I said, lxd.db isn't a thing anymore in LXD 3.0.

The issue is that the raft database isn't coming online here for some reason. That error suggests that the database couldn't be read within the timeout period.

Can you post find /var/snap/lxd/common/lxd/database and du -sch /var/snap/lxd/common/lxd/database?

davidfavor commented 6 years ago

Ah... so no more lxd.db anymore. Got it.

net10 # find /var/snap/lxd/common/lxd/database
/var/snap/lxd/common/lxd/database
/var/snap/lxd/common/lxd/database/global
/var/snap/lxd/common/lxd/database/global/db.bin-shm
/var/snap/lxd/common/lxd/database/global/logs.db
/var/snap/lxd/common/lxd/database/global/db.bin-wal
/var/snap/lxd/common/lxd/database/global/snapshots
/var/snap/lxd/common/lxd/database/global/db.bin
/var/snap/lxd/common/lxd/database/local.db
net10 # du -sch /var/snap/lxd/common/lxd/database
40M /var/snap/lxd/common/lxd/database
40M total
stgraber commented 6 years ago

Ok, that's not particularly light but also not unusually large.

What version of the snap do you have? snap info lxd

davidfavor commented 6 years ago

Here's something curious...

If I do inotifywait -qmr /var/snap/lxd/common/lxd/database

Then I see /var/snap/lxd/common/lxd/database/global/ MODIFY db.bin-wal - spew from the point of "Initializing global database" till "Initializing storage pools".

So a massive number of writes which seems odd since local.db only contains 53 entries, so something is being written repeatedly into local.db many times.

Maybe this information helps.

davidfavor commented 6 years ago
net10 # snap info lxd
name:      lxd
summary:   System container manager and API
publisher: canonical
contact:   https://github.com/lxc/lxd/issues
license:   unknown
description: |
  LXD is a container manager for system containers.

  It offers a REST API to remotely manage containers over the network, using an image based workflow
  and with support for live migration.

  Images are available for all Ubuntu releases and architectures as well as for a wide number of other
  Linux distributions.

  LXD containers are lightweight, secure by default and a great alternative to virtual machines.
commands:
  - lxd.benchmark
  - lxd.check-kernel
  - lxd.database
  - lxd.lxc
  - lxd
  - lxd.migrate
services:
  lxd.daemon: simple, enabled, inactive
snap-id:   J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking:  stable
refreshed: 2018-04-28T17:49:22-05:00
installed:       3.0.0       (6862) 56MB -
channels:                           
  stable:        3.0.0       (6879) 56MB -
  candidate:     3.0.0       (6879) 56MB -
  beta:          ↑                       
  edge:          git-768e6ea (6891) 56MB -
  2.0/stable:    2.0.11      (6627) 27MB -
  2.0/candidate: 2.0.11      (6627) 27MB -
  2.0/beta:      ↑                       
  2.0/edge:      git-d71807e (6630) 25MB -
  3.0/stable:    3.0.0       (6882) 56MB -
  3.0/candidate: 3.0.0       (6882) 56MB -
  3.0/beta:      ↑                       
  3.0/edge:      git-69217a8 (6897) 56MB -
stgraber commented 6 years ago

Ok, can you run snap refresh lxd? We just published a new stable snap which hopefully will include some more debugging logic from @freeekanayaka

davidfavor commented 6 years ago

Done.

stgraber commented 6 years ago

What does systemctl status snap.lxd.daemon show now? there's a good chance the refresh is trying to start it back up

davidfavor commented 6 years ago
net10 # systemctl status snap.lxd.daemon
● snap.lxd.daemon.service - Service for snap application lxd.daemon
   Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2018-05-01 10:24:31 CDT; 1min 7s ago
  Process: 14433 ExecStart=/usr/bin/snap run lxd.daemon (code=exited, status=1/FAILURE)
 Main PID: 14433 (code=exited, status=1/FAILURE)

May 01 10:24:31 net10.bizcooker.com systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
May 01 10:24:31 net10.bizcooker.com systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 6.
May 01 10:24:31 net10.bizcooker.com systemd[1]: Stopped Service for snap application lxd.daemon.
May 01 10:24:31 net10.bizcooker.com systemd[1]: snap.lxd.daemon.service: Start request repeated too quickly.
May 01 10:24:31 net10.bizcooker.com systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
May 01 10:24:31 net10.bizcooker.com systemd[1]: Failed to start Service for snap application lxd.daemon.
stgraber commented 6 years ago

Ok, so it did try to start but failed again. Can you do:

We'll see if we get some more details with the newer snap.

davidfavor commented 6 years ago

Did a reboot first.

net10 # systemctl status snap.lxd.daemon
● snap.lxd.daemon.service - Service for snap application lxd.daemon
   Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2018-05-01 10:33:05 CDT; 1min 40s ago
 Main PID: 1036 (daemon.start)
    Tasks: 0 (limit: 4915)
   CGroup: /system.slice/snap.lxd.daemon.service
           ‣ 1036 /bin/sh /snap/lxd/6879/commands/daemon.start

May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]:   3: fd:   9: hugetlb
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]:   4: fd:  10: devices
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]:   5: fd:  11: freezer
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]:   6: fd:  12: rdma
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]:   7: fd:  13: cpu,cpuacct
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]:   8: fd:  14: net_cls,net_prio
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]:   9: fd:  15: blkio
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]:  10: fd:  16: cpuset
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]:  11: fd:  17: name=systemd
May 01 10:33:10 net10.bizcooker.com lxd.daemon[1036]:  12: fd:  18: unified

net10 # systemctl stop snap.lxd.daemon

net10 # pgrep lxd
# empty - no output

net10 # sudo lxd --debug --group lxd
INFO[05-01|15:35:51] LXD 3.0.0 is starting in normal mode     path=/var/snap/lxd/common/lxd
INFO[05-01|15:35:51] Kernel uid/gid map: 
INFO[05-01|15:35:51]  - u 0 0 4294967295 
INFO[05-01|15:35:51]  - g 0 0 4294967295 
INFO[05-01|15:35:51] Configured LXD uid/gid map: 
INFO[05-01|15:35:51]  - u 0 1000000 1000000000 
INFO[05-01|15:35:51]  - g 0 1000000 1000000000 
WARN[05-01|15:35:51] CGroup memory swap accounting is disabled, swap limits will be ignored. 
INFO[05-01|15:35:51] Initializing local database 
INFO[05-01|15:35:51] Initializing database gateway 
INFO[05-01|15:35:51] Start database node                      address= id=1
INFO[05-01|15:35:52] Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}] 
INFO[05-01|15:35:52] Raft: Node at 0 [Leader] entering Leader state 
INFO[05-01|15:35:52] LXD isn't socket activated 
INFO[05-01|15:35:52] Starting /dev/lxd handler: 
INFO[05-01|15:35:52]  - binding devlxd socket                 socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[05-01|15:35:52] REST API daemon: 
INFO[05-01|15:35:52]  - binding Unix socket                   socket=/var/snap/lxd/common/lxd/unix.socket
INFO[05-01|15:35:52]  - binding TCP socket                    socket=[::]:8443
INFO[05-01|15:35:52] Initializing global database 
INFO[05-01|15:37:03] Initializing storage pools 
DBUG[05-01|15:37:03] Initializing and checking storage pool "default". 
DBUG[05-01|15:37:03] Initializing a DIR driver. 
DBUG[05-01|15:37:03] Checking DIR storage pool "default". 
DBUG[05-01|15:37:03] Initializing a DIR driver. 
INFO[05-01|15:37:03] Initializing networks 
DBUG[05-01|15:37:04] Connecting to a remote simplestreams server 
INFO[05-01|15:37:04] Loading configuration 
DBUG[05-01|15:37:04] Initialized inotify with file descriptor 15 
INFO[05-01|15:37:04] Pruning expired images 
INFO[05-01|15:37:04] Done pruning expired images 
INFO[05-01|15:37:04] Updating instance types 
INFO[05-01|15:37:04] Expiring log files 
INFO[05-01|15:37:04] Updating images 
INFO[05-01|15:37:04] Done expiring log files 
DBUG[05-01|15:37:04] Processing image                         alias=18.04 fp=b36ec647e374da4816104a98807633a2cc387488083d3776557081c4d0333618 protocol=simplestreams server=https://cloud-images.ubuntu.com/releases
DBUG[05-01|15:37:04] Connecting to a remote simplestreams server 
INFO[05-01|15:37:06] Done updating instance types 
DBUG[05-01|15:37:06] Image already exists in the db           image=b36ec647e374da4816104a98807633a2cc387488083d3776557081c4d0333618
DBUG[05-01|15:37:06] Already up to date                       fp=b36ec647e374da4816104a98807633a2cc387488083d3776557081c4d0333618
DBUG[05-01|15:37:06] Processing image                         alias=17.10 fp=f7febb8cbebc6aa8a993eb1ce534963a6b288fde23b9594bb3ba4560704dd65c protocol=simplestreams server=https://cloud-images.ubuntu.com/releases
DBUG[05-01|15:37:06] Using SimpleStreams cache entry          expiry=2018-05-01T16:37:06+0000 server=https://cloud-images.ubuntu.com/releases
DBUG[05-01|15:37:06] Image already exists in the db           image=f7febb8cbebc6aa8a993eb1ce534963a6b288fde23b9594bb3ba4560704dd65c
DBUG[05-01|15:37:06] Already up to date                       fp=f7febb8cbebc6aa8a993eb1ce534963a6b288fde23b9594bb3ba4560704dd65c
INFO[05-01|15:37:06] Done updating images

At this point lxc list works.

davidfavor commented 6 years ago

If I CNTL-C out of lxd process + restart, I get the same problem as before...

net10 # sudo lxd --debug --group lxd
INFO[05-01|15:38:21] LXD 3.0.0 is starting in normal mode     path=/var/snap/lxd/common/lxd
INFO[05-01|15:38:21] Kernel uid/gid map: 
INFO[05-01|15:38:21]  - u 0 0 4294967295 
INFO[05-01|15:38:21]  - g 0 0 4294967295 
INFO[05-01|15:38:21] Configured LXD uid/gid map: 
INFO[05-01|15:38:21]  - u 0 1000000 1000000000 
INFO[05-01|15:38:21]  - g 0 1000000 1000000000 
WARN[05-01|15:38:21] CGroup memory swap accounting is disabled, swap limits will be ignored. 
INFO[05-01|15:38:21] Initializing local database 
INFO[05-01|15:38:21] Initializing database gateway 
INFO[05-01|15:38:21] Start database node                      address= id=1
INFO[05-01|15:38:21] Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}] 
INFO[05-01|15:38:21] Raft: Node at 0 [Leader] entering Leader state 
INFO[05-01|15:38:21] LXD isn't socket activated 
INFO[05-01|15:38:21] Starting /dev/lxd handler: 
INFO[05-01|15:38:21]  - binding devlxd socket                 socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[05-01|15:38:21] REST API daemon: 
INFO[05-01|15:38:21]  - binding Unix socket                   socket=/var/snap/lxd/common/lxd/unix.socket
INFO[05-01|15:38:21]  - binding TCP socket                    socket=[::]:8443
INFO[05-01|15:38:21] Initializing global database 
DBUG[05-01|15:39:21] Database error: failed to begin transaction: gRPC BEGIN response error: rpc error: code = Unknown desc = failed to handle BEGIN request: FSM out of sync: timed out enqueuing operation 
EROR[05-01|15:39:21] Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to begin transaction: gRPC BEGIN response error: rpc error: code = Unknown desc = failed to handle BEGIN request: FSM out of sync: timed out enqueuing operation 
INFO[05-01|15:39:21] Starting shutdown sequence 
INFO[05-01|15:39:21] Stopping REST API handler: 
INFO[05-01|15:39:21]  - closing socket                        socket=[::]:8443
INFO[05-01|15:39:21]  - closing socket                        socket=/var/snap/lxd/common/lxd/unix.socket
INFO[05-01|15:39:21] Stopping /dev/lxd handler 
INFO[05-01|15:39:21]  - closing socket                        socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[05-01|15:39:21] Stop database gateway 
INFO[05-01|15:39:21] Stop raft instance 
INFO[05-01|15:39:21] Stopping REST API handler: 
INFO[05-01|15:39:21] Stopping /dev/lxd handler 
INFO[05-01|15:39:21] Stopping REST API handler: 
INFO[05-01|15:39:21] Stopping /dev/lxd handler 
DBUG[05-01|15:39:21] Not unmounting temporary filesystems (containers are still running) 
INFO[05-01|15:39:21] Saving simplestreams cache 
INFO[05-01|15:39:21] Saved simplestreams cache 
Error: failed to open cluster database: failed to ensure schema: failed to begin transaction: gRPC BEGIN response error: rpc error: code = Unknown desc = failed to handle BEGIN request: FSM out of sync: timed out enqueuing operation
stgraber commented 6 years ago

Ok, so there's clearly something wrong going on with that database, it doesn't make any sense that it takes two minutes to initialize...

Any chance I can have you send a tarball of /var/snap/lxd/common/lxd/database to stgraber@ubuntu.com?

Hopefully I can reproduce the issue here and then pester @freeekanayaka to figure out what's going on. This seems to match another report we had in #4485 which describes LXD hammering the disk for a minute or so and then failing to bring up the database...

stgraber commented 6 years ago

I suspect you're very very close to the timeout somehow, so re-trying a bunch of times will likely eventually get it to start properly again.

stgraber commented 6 years ago

but it makes no sense that it's taking that long to pull anything from the local database so we need to figure out what's happening.

davidfavor commented 6 years ago

Tarball sent.

stgraber commented 6 years ago

Thanks for the tarball, startup only took 4s here so that's weird.

stgraber commented 6 years ago

Just takes 4s or so to start here but I do see heavy disk activity so it may be similar to yours except I'm on a super fast nvme SSD.

What filesystem and type of drive do you have behind /var/snap?

davidfavor commented 6 years ago

Likely you're running on a machine with a fresh bionic install, rather than an upgrade.

Upstream Ubuntu reports the same thing... that is... everything works with a fresh install.

Best for you to contact me on Skype + look at my actual machine.

davidfavor commented 6 years ago

Just read your email about restarting lxd multiple times...

Sigh... This is tough as all containers are currently down right now.

Can't really have an entire machine down for days to work this bug.

I'll leave the machine as-is for today. If no progress is made, I'll likely do a fresh install of bionic + see if that clears up the problem.

stgraber commented 6 years ago

When I say restarting lxd until it starts, I mean run "lxd --debug --group lxd" until you get past that database timeout. At that point you'll have your containers back until we get to debug more on our side.

stgraber commented 6 years ago

your database does take long to load here but for my system, a long time is 4s. I need to know what kind of storage you're using on your server to reproduce the longer delay you see.

davidfavor commented 6 years ago

Ah... I currently have a "lxd --debug --group lxd" process running now + lxc list returns container listings.

davidfavor commented 6 years ago

My lxd bootstrap sequence...

lxd init --auto --storage-backend=dir
lxc network create lxdbr0
lxc network attach-profile lxdbr0 default eth0

So no ZFS or BTRFS issues.

davidfavor commented 6 years ago

Running strace on the lxd --debug process there are 1000s of lines of...

futex(0x17d20c8, FUTEX_WAIT, 0, NULL)   = -1 EAGAIN (Resource temporarily unavailable)

Once EAGAIN clears, lxd seems to get out of it's loop.

Maybe this helps.

stgraber commented 6 years ago

I'm asking about your physical storage, are they ssd or hdd and what filesystem are you using for your system. The lxd storage config doesn't matter, I'm interested in what's under lxd.

davidfavor commented 6 years ago

Default OVH 2TB disk subsystem, so soft RAID.

net10 # cat /etc/fstab
# <file system> <mount point>   <type>  <options>   <dump>  <pass>
/dev/md4    /   ext4    errors=remount-ro,noatime,dioread_nolock,delalloc,defaults  0   1
/dev/md2    /boot   ext4    errors=remount-ro,noatime   0   1
/dev/sda3   swap    swap    defaults    0   0
/dev/sdb3   swap    swap    defaults    0   0
proc        /proc   proc    defaults        0   0
sysfs       /sys    sysfs   defaults        0   0
devtmpfs    /dev    devtmpfs    rw  0   0
davidfavor commented 6 years ago

Same setup as with Artful. No changes after do-release-upgrade -d finished.

stgraber commented 6 years ago

Ok, cool, I'll try on some slow spindles, see how long it takes there

davidfavor commented 6 years ago

Let me know if you have any other thoughts.

If you're stumped, I'll just do a bionic install from scratch.

stgraber commented 6 years ago

Ok, I think we figured it out. We'll deal with this in #4485. We believe we have a temporary workaround and then will have a fixed LXD that will prevent this from happening again.

davidfavor commented 6 years ago

This machine runs many hot spares for clients, so I had to do a fresh bionic install.

With fresh install all problems fixed.

So I'd tentatively say Artful -> Bionic upgrades should be avoided.

How I got my system working.

1) lxc stop all containers

2) lxc copy all containers to another LXD machine

3) do fresh install of Bionic

4) lxc copy all containers back to machine

At this point all's well.

laralar commented 6 years ago

I know it is closed, but there seems to be a bug in netplan for anonymous bridges that the bridge doesnt come up after reboot.

https://bugs.launchpad.net/ubuntu/+source/nplan/+bug/1736975

Only after ip link set up br1 the container got the IP Address from the DHCP server. Big issue for LXD /KVM, etc using bridges, there is a workaround though.. since I have like 80 servers with LXD I hope they fix it.

Kramerican commented 6 years ago

Just to clarify, for posterity, my issue was simple (stupid) and unrelated to @davidfavor 's issue: My containers were not configured using the new Netplan configuration. At the time I simply didn't know about Netplan. As soon as we set up a netplan config file, everything worked for my 18.04 containers.

davidfavor commented 6 years ago

I see the same thing as @Kramerican sees.

This seems to be something related to machine level networking interacting with containers.

If I upgrade a non-netplan machine or netplan machine to Bionic, same behavior seems to occur.

Seems like a Bionic packaging problem.

I finally gave up, moved all containers to another machine, did a fresh Bionic install, moved all containers back.

At this point all containers worked.