canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 926 forks source link

LXD creates lxdbr0 if it does not exist, despite being initialized without it. #11906

Closed ledlamp closed 1 year ago

ledlamp commented 1 year ago

Required information

Issue description

LXD is creating an lxdbr0 interface and reconfiguring the default profile if it does not exist on restart, even if it is not wanted, even if it was not asked for during lxd init.

Steps to reproduce

  1. Install lxd: snap install lxd
  2. Run lxd init but use an existing bridge (i.e. lxdbr1) instead of creating a new one: image
  3. Check lxc network ls and lxc profile show default. There is no lxdbr0 and the profile is configured as desired. image
  4. Restart lxd: snap restart lxd
  5. Check again lxc network ls and lxc profile show default. A new lxdbr0 now exists and the default profile was changed to use it. The user is now very angry. image

Information to attach

ledlamp commented 1 year ago

so to deal with it (as a work-around), there has to be an interface called lxdbr0, either leave the unneeded one it creates, or name your unmanaged interface lxdbr0 if you can. otherwise it'll create it and mess up your default profile

tomponline commented 1 year ago

Confirmed issue. This is really odd.

I've confirmed that killing the lxd process after initializing it with the existing lxdbr1 interface and then triggering the process to be restarted by running lxc ls doesn't create it. So I think we can rule this out as an actual LXD issue, but rather an external or packaging issue.

Additionally doing snap stop lxd and then snap start lxd doesn't trigger it either.

Further more its even easier to reproduce:

snap install lxd
lxc network ls # No lxdbr0
snap restart lxd
lxc network ls # Shows lxdbr0 managed network

So its something triggered from snap restart lxd.

I've also confirmed that we can actually see an API request coming into LXD upon snap restart lxd that inspects the existing networks and creates lxdbr0:

# Getting network list
Jun 29 06:58:13 vtest lxd.daemon[9910]: time="2023-06-29T06:58:13Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url="/1.0/networks?recursion=1" username=root
Jun 29 06:58:13 vtest lxd.daemon[9910]: time="2023-06-29T06:58:13Z" level=debug msg="WriteJSON\n\t{\n\t\t\"type\": \"sync\",\n\t\t\"status\": \"Success\",\n\t\t\"status_code\": 200,\n\t\t\"operation\": \"\",\n\t\t\"error_code\": 0,\n\t\t\"error\": \"\",\n\t\t\"metadata\": [\n\t\t\t{\n\t\t\t\t\"config\": {},\n\t\t\t\t\"description\": \"\",\n\t\t\t\t\"name\": \"lo\",\n\t\t\t\t\"type\": \"loopback\",\n\t\t\t\t\"used_by\": [],\n\t\t\t\t\"managed\": false,\n\t\t\t\t\"status\": \"\",\n\t\t\t\t\"locations\": null\n\t\t\t},\n\t\t\t{\n\t\t\t\t\"config\": {},\n\t\t\t\t\"description\": \"\",\n\t\t\t\t\"name\": \"enp5s0\",\n\t\t\t\t\"type\": \"physical\",\n\t\t\t\t\"used_by\": null,\n\t\t\t\t\"managed\": false,\n\t\t\t\t\"status\": \"\",\n\t\t\t\t\"locations\": null\n\t\t\t},\n\t\t\t{\n\t\t\t\t\"config\": {},\n\t\t\t\t\"description\": \"\",\n\t\t\t\t\"name\": \"lxdbr1\",\n\t\t\t\t\"type\": \"bridge\",\n\t\t\t\t\"used_by\": null,\n\t\t\t\t\"managed\": false,\n\t\t\t\t\"status\": \"\",\n\t\t\t\t\"locations\": null\n\t\t\t}\n\t\t]\n\t}" http_code=200

# Create lxdbr0 request
Jun 29 06:58:13 vtest lxd.daemon[9910]: time="2023-06-29T06:58:13Z" level=debug msg="Handling API request" ip=@ method=POST protocol=unix url=/1.0/networks username=root
Jun 29 06:58:13 vtest lxd.daemon[9910]: time="2023-06-29T06:58:13Z" level=debug msg="API Request\n\t{\n\t\t\"config\": {},\n\t\t\"description\": \"\",\n\t\t\"name\": \"lxdbr0\",\n\t\t\"type\": \"bridge\"\n\t}" ip=@ method=POST protocol=unix url=/1.0/networks username=root
tomponline commented 1 year ago

FWIW we dont tend to recommend using snap restart lxd because it will stop any running instances, instead we tend to use:

sudo systemctl reload snap.lxd.daemon

Which just restarts the running LXD daemon and not the instances.

This doesn't appear to trigger the issue either.

tomponline commented 1 year ago

@stgraber any ideas here, im at a bit of a loss. My only guess is that its something to do with either lxd-user or lxd-migrate (the lxd-migrate inside the snap that migrates from apt package) as that does create a bridge.

I also observe, but not sure if relevant this in the logs:

Jun 29 07:29:44 vtest audit[3869]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.lxd.migrate" pid=3869 comm="apparmor_parser"
Jun 29 07:29:49 vtest audit[3966]: AVC apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.lxd.migrate" pid=3966 comm="apparmor_parser"

Suggesting it may be being run.

There is something that is create a lxdbr0 bridge on snap restart lxd if there are not managed networks exist. It then goes on to add/replace an eth0 NIC device connected to that network to the default profile.

tomponline commented 1 year ago

Speaking with @stgraber he confirmed this is a bug in the snap restart command as it starts sub-units (like the lxd-user process) even if it wasn't previously running (because normally it starts by socket activation).

@gabrielmougard please can you open a bug with the snapd team for this https://bugs.launchpad.net/snapd/+filebug ?

Thanks

tomponline commented 1 year ago

@ru-fu @gabrielmougard we should change the reference in the docs to snap restart --reload lxd to snap restart --reload lxd.daemon so we don't instruct users on discovering this external bug in snapd.

gabrielmougard commented 1 year ago

@ru-fu I don't see any mentions of snap restart --reload lxd in our doc, but there is the Install LXD from a package section here. Should we add a Restart a snap LXD deployment sub-title below this section ?

tomponline commented 1 year ago

I believe @ru-fu fixed it already

ru-fu commented 1 year ago

I fixed the snap restart occurrences, yes. It might be a good idea to add a section about how to restart LXD. But I'm not sure if the installing page is the best place for it ... Is there a common scenario where you need to restart LXD? Maybe after server config changes?