cybozu-go / coil

CNI plugin for Kubernetes designed for scalability and extensibility
Apache License 2.0
158 stars 18 forks source link

[BUG] When using the following DualStack Pool I cannot create an interface with IPs for the pod. #208

Closed Cellebyte closed 2 years ago

Cellebyte commented 2 years ago

This bug could be related to the following issue. https://github.com/vishvananda/netlink/issues/576

coil: ghcr.io/cybozu-go/coil:2.0.14 Kubernetes Version: 1.21.11 Container Runtime: cri-o 1.21.6 Linux OS: AlmaLinux 4.18.0-348.20.1.el8_5.x86_64

apiVersion: coil.cybozu.com/v2
kind: AddressPool
metadata:
  name: lb
spec:
  blockSizeBits: 6
  subnets:
  - ipv4: 100.126.16.0/20
    ipv6: 2001:7c7:2100:42f:ffff:ffff:ffff:f000/116

Namespace with requirement for DualStack Pods

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    coil.cybozu.com/pool: lb
  name: gitlab-runner
  resourceVersion: "407513963"
  uid: a698674e-6c56-4728-bfdf-952bd2248433
spec:
  finalizers:
  - kubernetes
status:
  phase: Active
  Warning  FailedCreatePodSandBox  13s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_runner-zaevnzet-project-392-concurrent-0cdzrw_gitlab-runner_abfc57ce-1626-444f-a0f0-635ac3ce9e27_0(372c1a10840209f43cb46fce286f263d0a25da436c6c2157a86517d590f730a4): error adding pod gitlab-runner_runner-zaevnzet-project-392-concurrent-0cdzrw to CNI network "k8s-pod-network": failed to setup pod network; netlink: failed to add a hostIPv4 address: numerical result out of range
Mar 18 22:56:18 wuerfelchen-w-3 kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Mar 18 22:56:18 wuerfelchen-w-3 kernel: IPv6: ADDRCONF(NETDEV_UP): veth88f11a25: link is not ready
Mar 18 22:56:18 wuerfelchen-w-3 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth88f11a25: link becomes ready
Mar 18 22:56:18 wuerfelchen-w-3 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Mar 18 22:56:20 wuerfelchen-w-3 kernel: netlink: 'coild': attribute type 2 has an invalid length.

The code lines which could be the issue are here.

https://github.com/cybozu-go/coil/blob/1db9b2d01220d2d86ee43a8ce7e0f9cf35e0b313/v2/pkg/ipam/node.go#L289-L312

{"level":"info","ts":1647643830.4085627,"logger":"node-ipam","msg":"requesting a new block","pool":"lb"}
{"level":"info","ts":1647643830.4251256,"logger":"node-ipam","msg":"waiting for request completion","pool":"lb"}
{"level":"info","ts":1647643830.4395173,"logger":"node-ipam","msg":"adding a new block","pool":"lb","name":"lb-0","block-pool":"lb","block-node":"wuerfelchen-w-3"}
{"level":"info","ts":1647643830.4395475,"logger":"node-ipam","msg":"allocated","pool":"lb","block":"lb-0","ipv4":"100.126.16.0","ipv6":"2001:7c7:2100:42f:ffff:ffff:ffff:f000"}
{"level":"info","ts":1647643830.4438093,"logger":"route-exporter","msg":"synchronizing routing table","table-id":119}
{"level":"info","ts":1647643831.7375743,"logger":"node-ipam","msg":"freeing an empty block","pool":"lb","block":"lb-0"}
{"level":"info","ts":1647643831.768794,"logger":"route-exporter","msg":"synchronizing routing table","table-id":119}
{"level":"error","ts":1647643831.7710552,"logger":"grpc","msg":"failed to setup pod network","grpc.start_time":"2022-03-18T22:50:30Z","grpc.request.deadline":"2022-03-18T22:51:30Z","system":"grpc","span.kind":"server","grpc.service":"pkg.cnirpc.CNI","grpc.method":"Add","grpc.request.pod.namespace":"gitlab-runner","grpc.request.netns":"/var/run/netns/fe9df582-47b9-4b13-844d-88165d87ab7f","grpc.request.ifname":"eth0","grpc.request.container_id":"665dbc5ad3ec9ff5d19f55d860f52939694eaaf48c314a80c4ebe84b3b0a10b8","peer.address":"@","grpc.request.pod.name":"debug-pod","error":"netlink: failed to add a hostIPv4 address: numerical result out of range"}
{"level":"error","ts":1647643831.7711568,"logger":"grpc","msg":"finished unary call with code Internal","grpc.start_time":"2022-03-18T22:50:30Z","grpc.request.deadline":"2022-03-18T22:51:30Z","system":"grpc","span.kind":"server","grpc.service":"pkg.cnirpc.CNI","grpc.method":"Add","grpc.request.pod.name":"debug-pod","grpc.request.pod.namespace":"gitlab-runner","grpc.request.netns":"/var/run/netns/fe9df582-47b9-4b13-844d-88165d87ab7f","grpc.request.ifname":"eth0","grpc.request.container_id":"665dbc5ad3ec9ff5d19f55d860f52939694eaaf48c314a80c4ebe84b3b0a10b8","peer.address":"@","error":"rpc error: code = Internal desc = failed to setup pod network","grpc.code":"Internal","grpc.time_ms":1368.211}
{"level":"info","ts":1647643831.7817273,"logger":"grpc","msg":"waiting before destroying pod network","grpc.start_time":"2022-03-18T22:50:31Z","grpc.request.deadline":"2022-03-18T22:51:31Z","system":"grpc","span.kind":"server","grpc.service":"pkg.cnirpc.CNI","grpc.method":"Del","peer.address":"@","grpc.request.pod.name":"debug-pod","grpc.request.pod.namespace":"gitlab-runner","grpc.request.netns":"/var/run/netns/fe9df582-47b9-4b13-844d-88165d87ab7f","grpc.request.ifname":"eth0","grpc.request.container_id":"665dbc5ad3ec9ff5d19f55d860f52939694eaaf48c314a80c4ebe84b3b0a10b8","duration":"30s"}
{"level":"error","ts":1647643861.784251,"logger":"grpc","msg":"intentionally ignoring error for v1 migration","grpc.start_time":"2022-03-18T22:50:31Z","grpc.request.deadline":"2022-03-18T22:51:31Z","system":"grpc","span.kind":"server","grpc.service":"pkg.cnirpc.CNI","grpc.method":"Del","peer.address":"@","grpc.request.pod.name":"debug-pod","grpc.request.pod.namespace":"gitlab-runner","grpc.request.netns":"/var/run/netns/fe9df582-47b9-4b13-844d-88165d87ab7f","grpc.request.ifname":"eth0","grpc.request.container_id":"665dbc5ad3ec9ff5d19f55d860f52939694eaaf48c314a80c4ebe84b3b0a10b8","error":"link not found"}
{"level":"info","ts":1647643861.7843103,"logger":"grpc","msg":"finished unary call with code OK","grpc.start_time":"2022-03-18T22:50:31Z","grpc.request.deadline":"2022-03-18T22:51:31Z","system":"grpc","span.kind":"server","grpc.service":"pkg.cnirpc.CNI","grpc.method":"Del","peer.address":"@","grpc.request.pod.name":"debug-pod","grpc.request.pod.namespace":"gitlab-runner","grpc.request.netns":"/var/run/netns/fe9df582-47b9-4b13-844d-88165d87ab7f","grpc.request.ifname":"eth0","grpc.request.container_id":"665dbc5ad3ec9ff5d19f55d860f52939694eaaf48c314a80c4ebe84b3b0a10b8","grpc.code":"OK","grpc.time_ms":30002.637}
Cellebyte commented 2 years ago

@ysksuzuki are you able to reproduce it?

ysksuzuki commented 2 years ago

@Cellebyte I'm sorry for keeping you waiting. I will check this next week.

Cellebyte commented 2 years ago

@ysksuzuki not dramatic ^^ I‘am currently running with the QuickFix from the PR #209

Cellebyte commented 2 years ago

@ysksuzuki are you now able to reproduce it? :D

ysksuzuki commented 2 years ago

@Cellebyte Actually, I can't. The dual-stack address pool is working fine in my kind cluster where ipFamily is dual.

apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
networking:
  ipFamily: dual
  disableDefaultCNI: true
nodes:
- role: control-plane
- role: worker
- role: worker
- role: worker
ysksuzuki commented 2 years ago

Could you elaborate on why https://github.com/vishvananda/netlink/issues/576 could be related to this dual-stack problem, and the PR you sent us can be a solution?

Cellebyte commented 2 years ago

@ysksuzuki as the issue creator elaborates the not specifically set cidr mask could lead to issues when assigning dual-stack ips via netlink as it wrongly guesses the subnetmask when you change the order of ip addresses to ipv6 first. It tries to squeeze v6 into v4 because we only pass the ip without a cidrmask to netlink. And netlink at least in el8 based distros will fail with the error messages above because the attribute size could get to big.

ysksuzuki commented 2 years ago

@Cellebyte

the not specifically set cidr mask could lead to issues when assigning dual-stack ips via netlink as it wrongly guesses the subnetmask when you change the order of ip addresses to ipv6 first.

The author of the https://github.com/vishvananda/netlink/issues/576 reports that the mask passed into an address in 16-byte format derived from ParseIP causes the problem, and users can work around it by Mask: net.IPMask(net.ParseIP("255.255.255.0").To4()). He doesn't mention what you are saying if I understand it correctly.

Even if that causes the problem, netlink.NewIPNet which coil calls when it adds an IP to a link device sets a cidr mask accordingly. So I don't think that can cause the problem.

https://github.com/cybozu-go/coil/blob/4766966b5fbdd603e4be607c7db94047e05eac69/v2/pkg/nodenet/pod.go#L331 https://github.com/vishvananda/netlink/blob/5e915e0149386ce3d02379ff93f4c0a5601779d5/netlink.go#L35

As for the PR you sent, I think your patch doesn't have any effects on IPs coil allocates. netutil.IPAdd returns an IP in 4-byte representation for IPv4 and in 16-byte representation for IPv6. Therefore ipv4.Mask(net.CIDRMask(32, 32)) and ipv6.Mask(net.CIDRMask(128, 128)) won't modify those IPs.

https://github.com/cybozu-go/netutil/blob/7b3ee6fd8aa95d6611bb03d0ef175cac2287f15f/calc.go#L11

Cellebyte commented 2 years ago

It is not modifying the IPs but setting the correct mask on older netlink implementations. Try a DualStack pool on an AlmaLinux System and you will see what I mean.

ysksuzuki commented 2 years ago

I mean ipv4.Mask(net.CIDRMask(32, 32)) and ipv6.Mask(net.CIDRMask(128, 128)) won't have any effects on the IPs after netutil.IPAdd. That always returns the same slice of bytes. Please elaborate on why/how this patch setting the correct mask on older netlink implementations, e.g. show us example code.

https://go.dev/play/p/MNcBHSK0s-m

Cellebyte commented 2 years ago

@ysksuzuki the byte array length is not the issue. The mask itself is the problem because as the error message points out netlink things it is a smaller bytearray then it actually is. So it breaks in the C API. Will check if I find the corresponding code line in netlink.

ysksuzuki commented 2 years ago

Just to be clear, you could confirm that your patch fixed this problem in your environment, right? As I mentioned above, ipv4.Mask(net.CIDRMask(32, 32)) and ipv6.Mask(net.CIDRMask(128, 128)) won't have any effects on the IPs. Those always return the exact same slices of bytes. How can those affect the masks?

Cellebyte commented 2 years ago

@ysksuzuki Yep I can confirm that it works now. I do not know. I think it is something outside the Go Code itself.

ysksuzuki commented 2 years ago

Does the problem occur without the patch? I can't imagine something outside the Go would fix a problem.

Cellebyte commented 2 years ago

@ysksuzuki without the patch I am not able to create any veth interfaces for my Pod. netlink: 'coild': attribute type 2 has an invalid length. is the error message.

ysksuzuki commented 2 years ago

That's weird...

ymmt2005 commented 2 years ago

@Cellebyte Are you running on a non x86 system such as ARM64?

Cellebyte commented 2 years ago

@ymmt2005 nope x86_64 vm AlmaLinux8

ymmt2005 commented 2 years ago

Hmm. I have no clue. The proposed patch seems a no-op, as Yusuke is saying.

ymmt2005 commented 2 years ago

I guessed that you could fix the problem because you rebuilt the Coil image.

Cellebyte commented 2 years ago

@ymmt2005 as the issue from the netlink library shows it could happen that it treats the IPAddress without a mask as /0 which could implement wrong routes.

ymmt2005 commented 2 years ago

@Cellebyte That issue does not apply to Coil. I'm aware of it, and Coil gives a proper netmask to each invocation of netlink.AddrAdd or netlink.RouteAdd.

Cellebyte commented 2 years ago

@ymmt2005 the question still remains. Why I was not able to use the current upstream version of coil with my AlmaLinux 8 Servers ^^

ysksuzuki commented 2 years ago

@Cellebyte All I can say for now is that there is nothing to do with https://github.com/vishvananda/netlink/issues/576 and your patch itself doesn't fix this problem. You might be able to confirm it, e.g. try a rebuilt coil without the patch.

Can you investigate what is actually happening in your environment? It would be helpful if you could do that since we don't normally use Alma Linux.

Cellebyte commented 2 years ago

@ysksuzuki yeah I could try rebuilding the image without the patch.

Cellebyte commented 2 years ago

@ysksuzuki you were right. I had now time to further debug the bug and it seems like after I disabled CGO I am able to successfully deploy pods without any issue.

Cellebyte commented 2 years ago

@ysksuzuki I found the issue in coil, can we merge the Pull Request?

ysksuzuki commented 2 years ago

@Cellebyte Thank you for investigating it. Could you recreate the PR or squash and tidy up the commits? And I would appreciate it if you could share why disabling CGO fixed this problem.

Cellebyte commented 2 years ago

@ysksuzuki I added a little comment on the PR.