canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.35k stars 931 forks source link

Noisy logs from refreshing instance types #13964

Closed simondeziel closed 1 month ago

simondeziel commented 1 month ago

On a box without IPv6, LXD routinely logs this:

# No IPv6 default route
$ ip -6 ro
::1 dev lo proto kernel metric 256 pref medium

$ journalctl -n 5 -b0 -u snap.lxd.daemon | cat
Aug 19 14:13:59 jupiter lxd.daemon[930]: => LXD is ready
Aug 19 14:14:06 jupiter lxd.daemon[1075]: time="2024-08-19T14:14:06Z" level=warning msg="Failed to update instance types: Get \"https://images.lxd.canonical.com/meta/instance-types/.yaml\": Unable to connect to: images.lxd.canonical.com:443 ([dial tcp [2620:2d:4000:100c::147]:443: connect: network is unreachable])"
Aug 19 14:14:06 jupiter lxd.daemon[1075]: time="2024-08-19T14:14:06Z" level=error msg="Failed updating instance types" err="Get \"https://images.lxd.canonical.com/meta/instance-types/.yaml\": Unable to connect to: images.lxd.canonical.com:443 ([dial tcp [2620:2d:4000:100c::147]:443: connect: network is unreachable])"
Aug 22 14:14:06 jupiter lxd.daemon[1075]: time="2024-08-22T14:14:06Z" level=warning msg="Failed to update instance types: Get \"https://images.lxd.canonical.com/meta/instance-types/.yaml\": Unable to connect to: images.lxd.canonical.com:443 ([dial tcp [2620:2d:4000:100c::147]:443: connect: network is unreachable])"
Aug 22 14:14:06 jupiter lxd.daemon[1075]: time="2024-08-22T14:14:06Z" level=error msg="Failed updating instance types" err="Get \"https://images.lxd.canonical.com/meta/instance-types/.yaml\": Unable to connect to: images.lxd.canonical.com:443 ([dial tcp [2620:2d:4000:100c::147]:443: connect: network is unreachable])"

According to tcpdump, LXD is able to refresh the instance types definitions using IPv4 so logging 2 warnings and 2 errors seems a bit much for something that is apparently inconsequential.

MggMuggins commented 1 month ago

@simondeziel I've put up #14092 to dedup the log entries but I'm not able to reproduce this:

Set instance types refresh to happen every minute:

diff --git a/lxd/instance_instance_types.go b/lxd/instance_instance_types.go
index c6b3233a0d..87419efbdf 100644
--- a/lxd/instance_instance_types.go
+++ b/lxd/instance_instance_types.go
@@ -8,6 +8,7 @@ import (
        "os"
        "strconv"
        "strings"
+       "time"

        "gopkg.in/yaml.v2"

@@ -97,7 +98,7 @@ func instanceRefreshTypesTask(d *Daemon) (task.Func, task.Schedule) {
                logger.Info("Done updating instance types")
        }

-       return f, task.Daily()
+       return f, task.Every(time.Minute)
 }

 func instanceRefreshTypes(ctx context.Context, s *state.State) error {

Remove all the ipv6 addrs from the VM's only iface via netplan:

...
        enp5s0:
            dhcp6: false
            accept-ra: false
            link-local: []
$ ping 2620:2d:4000:100c::147
ping: connect: Network is unreachable

Observe no failures during instance types updates:

INFO   [2024-09-12T15:25:37-05:00] Updating instance types
INFO   [2024-09-12T15:25:37-05:00] Updating images
INFO   [2024-09-12T15:25:37-05:00] Done expiring log files
INFO   [2024-09-12T15:25:37-05:00] Started instance                              action=start created="2024-08-01 23:57:08.129353886 +0000 UTC" ephemeral=false instance=c1 instanceType=container project=default stateful=false used="2024-09-12 19:41:19.134436943 +0000 UTC"
INFO   [2024-09-12T15:25:37-05:00] Daemon started
INFO   [2024-09-12T15:25:38-05:00] Done updating instance types
INFO   [2024-09-12T15:25:39-05:00] Done updating images

Is your DNS server returning AAAA records by default and lxd is falling back to ipv4 after failing to connect over ipv6?

simondeziel commented 1 month ago

@MggMuggins I should have collected more information at the time the issue happened as I cannot reproduce either :/

In terms of DNS, nothing special is done, systemd-resolved presumably returns both the AAAA and A records:

root@jupiter:~# resolvectl query images.lxd.canonical.com
images.lxd.canonical.com: 2620:2d:4000:100c::147 -- link: enp3s0
                          185.125.189.121      -- link: enp3s0

-- Information acquired via protocol DNS in 11.6ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: yes
-- Data from: cache network

In terms of disabling IPv6, same as what you tried:

root@jupiter:~# ip -6 a; ip -6 ro
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
::1 dev lo proto kernel metric 256 pref medium

I tried reproducing on 5.21/stable and 5.0/stable and in both cases I failed to get the warning/error despite rm /var/snap/lxd/common/lxd/cache/instance_types.yaml between LXD restarts. Speaking of which, restarting LXD doesn't seem to be enough to trigger a download of the instance_types :/

I think your PR to reduce the log duplication is a good start and probably good enough considering we cannot reproduce this reliably ATM.