flannel-io / flannel

flannel is a network fabric for containers, designed for Kubernetes
Apache License 2.0
8.6k stars 2.87k forks source link

Wireguard Backend logs error message "failed to add route flannel-wg: file exists" but it is not an problem #1963

Closed maltelehmann closed 1 month ago

maltelehmann commented 1 month ago

Expected Behavior

The wireguard backend should only log error messages in case of problems that hinder flannel working correctly.

Current Behavior

If the wireguard flannel route (named flannel-wg) already exists on the host, eg if the cluster has multiple nodes, the backend logs an error message though there is no problem.

This behaviour can be found in e2e-tests (last line):

+ echo '########## logs for flannel-e2e-test-flannel1 container ##########'
+ docker logs flannel-e2e-test-flannel1
I0429 09:11:22.020976       1 main.go:210] CLI flags config: {etcdEndpoints:http://172.17.0.1:2379/ etcdPrefix:/coreos.com/network etcdKeyfile:/certs/client-key.pem etcdCertfile:/certs/client.pem etcdCAFile:/certs/ca.pem etcdUsername: etcdPassword: version:false kubeSubnetMgr:false kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:false ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true}
W0429 09:11:22.021101       1 main.go:504] no subnet found for key: FLANNEL_SUBNET in file: /run/flannel/subnet.env
W0429 09:11:22.021171       1 main.go:539] no subnet found for key: FLANNEL_IPV6_SUBNET in file: /run/flannel/subnet.env
I0429 09:11:22.021805       1 main.go:230] Created subnet manager: Etcd Local Manager with Previous Subnet: None
I0429 09:11:22.021821       1 main.go:233] Installing signal handlers
E0429 09:11:22.024823       1 main.go:447] Couldn't fetch network config: flannel config not found in etcd store. Did you create your config using etcdv3 API?
I0429 09:11:23.025660       1 main.go:451] Found network config - Backend type: wireguard
I0429 09:11:23.025697       1 match.go:210] Determining IP address of default interface
I0429 09:11:23.025951       1 match.go:263] Using interface with name eth0 and address 172.17.0.3
I0429 09:11:23.025979       1 match.go:285] Defaulting external address to interface address (172.17.0.3)
I0429 09:11:23.111169       1 local_manager.go:158] Found lease (ip: 10.10.12.0/24 ipv6: ::/0) for current IP (172.17.0.3), reusing
I0429 09:11:23.112734       1 iptables.go:51] Starting flannel in iptables mode...
I0429 09:11:23.112752       1 iptables.go:226] Changing default FORWARD chain policy to ACCEPT
I0429 09:11:23.120038       1 main.go:395] Wrote subnet file to /run/flannel/subnet.env
I0429 09:11:23.120053       1 main.go:399] Running backend.
I0429 09:11:23.120087       1 wireguard_network.go:81] Watching for new subnet leases
I0429 09:11:23.120956       1 local_manager.go:322] manager.WatchLease: sending reset results...
I0429 09:11:23.121058       1 local_manager.go:399] Waiting for 22h59m58.999891476s to renew lease
I0429 09:11:23.121480       1 registry.go:291] registry: watching subnets starting from rev 38
I0429 09:11:23.121525       1 subnet.go:152] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa0a4100, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xac110004, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x31, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x35, 0x65, 0x3a, 0x66, 0x61, 0x3a, 0x64, 0x35, 0x3a, 0x33, 0x30, 0x3a, 0x64, 0x66, 0x3a, 0x36, 0x32, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(2024, time.April, 30, 9, 11, 16, 121460740, time.Local), Asof:34}} }
W0429 09:11:23.121605       1 wireguard_network.go:144] Ignoring non-wireguard subnet: type=vxlan
I0429 09:11:23.128206       1 iptables.go:365] trying to run iptables-restore < map[filter:[[-A FORWARD -m comment --comment flanneld forward -j FLANNEL-FWD] [-A FLANNEL-FWD -s 10.10.0.0/16 -m comment --comment flanneld forward -j ACCEPT] [-A FLANNEL-FWD -d 10.10.0.0/16 -m comment --comment flanneld forward -j ACCEPT]]]
I0429 09:11:23.128238       1 iptables_restore.go:94] trying to run with payload *filter
-A FORWARD -m comment --comment "flanneld forward" -j FLANNEL-FWD
-A FLANNEL-FWD -s 10.10.0.0/16 -m comment --comment "flanneld forward" -j ACCEPT
-A FLANNEL-FWD -d 10.10.0.0/16 -m comment --comment "flanneld forward" -j ACCEPT
COMMIT
I0429 09:11:23.129402       1 iptables.go:372] bootstrap done
I0429 09:11:23.222041       1 registry.go:309] watchSubnets: got valid subnet event with revision 38
I0429 09:11:23.222081       1 subnet.go:152] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xa0a4100, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xac110004, PublicIPv6:(*ip.IP6)(nil), BackendType:"wireguard", BackendData:json.RawMessage{0x7b, 0x22, 0x50, 0x75, 0x62, 0x6c, 0x69, 0x63, 0x4b, 0x65, 0x79, 0x22, 0x3a, 0x22, 0x69, 0x59, 0x59, 0x31, 0x43, 0x4f, 0x31, 0x67, 0x37, 0x6b, 0x2b, 0x4b, 0x73, 0x50, 0x30, 0x44, 0x43, 0x65, 0x62, 0x4a, 0x75, 0x50, 0x74, 0x6f, 0x75, 0x59, 0x68, 0x36, 0x5a, 0x56, 0x38, 0x47, 0x38, 0x73, 0x73, 0x51, 0x49, 0x50, 0x72, 0x5a, 0x49, 0x6c, 0x4d, 0x3d, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(2024, time.April, 30, 9, 11, 22, 222027325, time.Local), Asof:0}} }
I0429 09:11:23.222163       1 wireguard_network.go:175] Subnet added: 10.10.65.0/24 via 172.17.0.4:51820
E0429 09:11:23.223560       1 wireguard_network.go:188] failed to add ipv4 route to (10.10.0.0/16): failed to add route flannel-wg: file exists

Possible Solution

Use netlink.RouteReplace instead of netlink.RouteAdd. This method adds or replaces the existing route, not returning an error. Since this method is only used when nodes are added, the possible overhead of replacing the route instead of throwing an error should be acceptable.

(I will provide a PR for this).

Steps to Reproduce (for bugs)

See e2e-test logs above. Otherwise, create a cluster with multiple nodes and take a look at flannel logs.

Context

I had problems with kubernetes networking after re-installing k3s on a node and stumbled across those error messages which mislead me into thinking that they show a problem.

Your Environment

See e2e-tests