containernetworking / plugins

Some reference and example networking plugins, maintained by the CNI team.
Apache License 2.0
2.23k stars 788 forks source link

macvlan: error killing pod: Link not found #953

Closed cyclinder closed 1 year ago

cyclinder commented 1 year ago
Events:
  Type     Reason          Age                   From               Message
  ----     ------          ----                  ----               -------
  Normal   Scheduled       5m8s                  default-scheduler  Successfully assigned ns4740-192363920/ds-4bfa5c3fb5-8pwcf to spiderpool1008202421-control-plane
  Normal   AddedInterface  5m8s                  multus             Add eth0 [fc00:f853:ccd:e793:f::c7/64 172.18.40.6/16] from kube-system/macvlan-vlan0
  Normal   AddedInterface  5m6s                  multus             Add net1 [fd00:95b2::3/120 10.185.192.3/24] from ns4740-192363920/test-multus-6d8e8be07a
  Normal   Pulled          5m5s                  kubelet            Container image "alpine" already present on machine
  Normal   Created         5m5s                  kubelet            Created container samplepod
  Normal   Started         5m5s                  kubelet            Started container samplepod
  Normal   Killing         5m3s                  kubelet            Stopping container samplepod
  Warning  FailedKillPod   32s (x21 over 4m33s)  kubelet            error killing pod: failed to "KillPodSandbox" for "5c8fe27b-afa4-4ff2-9399-93773a4e1f5e" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"402c1fae47b140592e12cd11f3f5f1927788e6196e6f8d313ab09eee9cd4fedc\": delegateDel: error invoking ConflistDel - \"test-multus-6d8e8be07a\": conflistDel: error in getting result from DelNetworkList: Link not found"

I checked the code, this may have happened in https://github.com/containernetworking/plugins/blob/f95505231a8209b1b41eac2d5c2399c6ab444785/plugins/main/macvlan/macvlan.go#L407

Link not found != link not found?

cyclinder commented 1 year ago

But I tested the code, It works well.

root@cyclinder3:~/cyclinder/test/linknofound# cat main.go
package main

import (

    "fmt"
    "log"

    "github.com/containernetworking/plugins/pkg/ip"

)

func main() {

    if err := ip.DelLinkByName("eth0"); err != nil {
        fmt.Println("del err: ",err)
        if err != ip.ErrLinkNotFound {
            log.Fatalln(err)
        }
    }

    fmt.Println("Del")
}
root@cyclinder3:~/cyclinder/test/linknofound# go run main.go
del err:  link not found
Del
mlguerrero12 commented 1 year ago

I think the error is coming from the loadConf function of the del command. Probably when trying to get the MTU of the master. The "Link not found" with capital L is an error string returned by netlink. I'll have a look.

cyclinder commented 1 year ago

Yeah, Agree, It looks like only this is where errors can happen. I would like to fix this, May I?

mlguerrero12 commented 1 year ago

sure, thanks

cyclinder commented 1 year ago

It looks like we can't change the code of loadConf function directly, If we ignore the LinkNotFound error at getMTU, which can affect the cmdAdd, the best option is that we check the LinkNotFound first in cmdDel, If true, we return directly.

mlguerrero12 commented 1 year ago

what happens with the macvlan interfaces after the master is deleted? Are they removed as well?

cyclinder commented 1 year ago

yes.

root@cyclinder3:~# ip l show ens192.100
754: ens192.100@ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:50:56:b4:27:ca brd ff:ff:ff:ff:ff:ff
root@cyclinder3:~# ip netns exec net1 ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
755: macvlan1@if754: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 86:8f:ec:8e:bd:e9 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::848f:ecff:fe8e:bde9/64 scope link
       valid_lft forever preferred_lft forever
root@cyclinder3:~# ip l d ens192.100
root@cyclinder3:~# ip netns exec net1 ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
mlguerrero12 commented 1 year ago

cool

Another thing, what's the purpose of calling the loadConf function in the del command. It seems to me that we only need to access the n.IPAM parameter. If that's the case, we only need this

n := &NetConf{}
if err := json.Unmarshal(args.StdinData, n); err != nil {
    return nil, "", fmt.Errorf("failed to load netconf: %v", err)
}

We could replace loadConf with this in the del command (changing the error message "failed to unmarshall config" or something like that).

Just had a quick look. Please confirm this.

cyclinder commented 1 year ago

Yeah, I just checked the code, We only need n.IPAM, we can replace loadConf. But I'm not sure if additional parameters will be needed in the future.

mlguerrero12 commented 1 year ago

and they will be there, what I'm trying to say is that we don't need the other parts of loadConf

cyclinder commented 1 year ago

I mean we are not sure If we need the "other parts" of loadConf in the future (although most likely it is not needed😄)