canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

lxd-agent and cloud-init-local issues #14134

Open smcavoy-vercara opened 1 month ago

smcavoy-vercara commented 1 month ago

Required information

Issue description

When provisioning VMs cloud-init occasionally does not find the DataSource for LXD

Steps to reproduce

  1. Run provisioning watch for non-run of cloud-init 1-2 instance creations happen like this out of 10 occasionally again instances will consecutively fail

Information to attach

2024-09-19T12:39:25.044504+00:00 ubuntu lxd-agent[508]: time=“2024-09-19T12:37:17Z” level=info msg=Starting
2024-09-19T12:39:25.044526+00:00 ubuntu lxd-agent[508]: time=“2024-09-19T12:37:18Z” level=info msg=“Loading vsock module”
2024-09-19T12:39:25.044529+00:00 ubuntu lxd-agent[508]: time=“2024-09-19T12:37:18Z” level=info msg=“Started vsock listener”

in the above syslog shows an entry made by lxd-agent 2s after lxd-agent

a working cloud-init run:

2024-09-19T12:26:55.211488+00:00 testvm3 lxd-agent[517]: time=“2024-09-19T12:26:51Z” level=info msg=Starting
2024-09-19T12:26:55.211497+00:00 testvm3 lxd-agent[517]: time=“2024-09-19T12:26:51Z” level=info msg=“Loading vsock module”
2024-09-19T12:26:55.211501+00:00 testvm3 lxd-agent[517]: time=“2024-09-19T12:26:51Z” level=info msg=“Started vsock listener”

The above is from an identical job to provision a VM but the cloud-init did run successfully (found the LXD Datasource), notice the difference between syslog entry and lxd-agent

tomponline commented 1 month ago

in the above syslog shows an entry made by lxd-agent 2s after lxd-agent

Please could you elaborate what you mean by:

by lxd-agent 2s after lxd-agent

Do you mean the 1s gap between log messages?

Or do you mean its happening after cloud-init?

smcavoy-vercara commented 1 month ago

in the above syslog shows an entry made by lxd-agent 2s after lxd-agent

Please could you elaborate what you mean by:

by lxd-agent 2s after lxd-agent

Do you mean the 1s gap between log messages?

Or do you mean its happening after cloud-init?

The log entries are a symptom of the problem, so unimportant, sorry for the confusion

I found that lxd-agent starts after cloud-init-local despite the entry in /usr/lib/systemd/system/lxd-agent.service

...
Before=multi-user.target cloud-init.target cloud-init.service cloud-init-local.service
...

So the issue is specific to the image, which is built from Ubuntu cloud images but via packer.

Any suggestions as to how this might happen?

tomponline commented 1 month ago

Thanks.

So it doesn't happen with the ubuntu: remote images?

smcavoy-vercara commented 1 month ago

Confirmed. running an image from ubuntu: shows the correct start order of services

tomponline commented 1 month ago

Please can you share how you are building the image?

smcavoy-vercara commented 1 month ago

packer -> ansible -> qcow2 -> lxc image import