CentaurusInfra / mizar

Mizar – Experimental, High Scale and High Performance Cloud Network https://mizar.readthedocs.io
https://mizar.readthedocs.io
GNU General Public License v2.0
111 stars 50 forks source link

Mizar cni panic with vmdefault.yaml where bridge handles it fine #608

Closed yb01 closed 2 years ago

yb01 commented 2 years ago

What happened: try create vm pod with this pod def, which works fine with bridge cni.

apiVersion: v1
kind: Pod
metadata:
  name: vmdefault
  annotations:
    VirtletCPUModel: "host-model"
    VirtletSSHKeys: |
     ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCaJEcFDXEK2ZbX0ZLS1EIYFZRbDAcRfuVjpstSc0De8+sV1aiu+dePxdkuDRwqFtCyk6dEZkssjOkBXtri00MECLkir6FcH3kKOJtbJ6vy3uaJc9w1ERo+wyl6SkAh/+JTJkp7QRXj8oylW5E20LsbnA/dIwWzAF51PPwF7A7FtNg9DnwPqMkxFo1Th/buOMKbP5ZA1mmNNtmzbMpMfJATvVyiv3ccsSJKOiyQr6UG+j7sc/7jMVz5Xk34Vd0l8GwcB0334MchHckmqDB142h/NCWTr8oLakDNvkfC1YneAfAO41hDkUbxPtVBG5M/o7P4fxoqiHEX+ZLfRxDtHB53 me@localhost
spec:
  virtualMachine:
          #publicKey: "ssh-rsa AAA"
    keyPairName: "foobar"
    name: vm
    image: download.cirros-cloud.net/0.5.1/cirros-0.5.1-x86_64-disk.img
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        cpu: "1"
        memory: "1Gi"
      requests:
        cpu: "1"
        memory: "1Gi"
2022-02-02T04:33:49.12826348Z stderr F I0202 04:33:49.128143   11140 client.go:140] CNI config: name: "mizarcni" type: "mizarcni"
2022-02-02T04:33:49.128337332Z stderr F I0202 04:33:49.128284   11140 client.go:218] AddSandboxToNetwork: PodID "9d85f320-dca1-4ca3-baff-981fde17188f", PodName "vmdefault", PodNs "default", PodTenant "system",  VPC "", NICs "", runtime config:
2022-02-02T04:33:49.128350016Z stderr F (*libcni.RuntimeConf)(0xc42050ea50)({
2022-02-02T04:33:49.128356793Z stderr F  ContainerID: (string) (len=36) "9d85f320-dca1-4ca3-baff-981fde17188f",
2022-02-02T04:33:49.128363172Z stderr F  NetNS: (string) (len=51) "/var/run/netns/9d85f320-dca1-4ca3-baff-981fde17188f",
2022-02-02T04:33:49.128369906Z stderr F  IfName: (string) (len=12) "virtlet-eth0",
2022-02-02T04:33:49.128375565Z stderr F  Args: ([][2]string) (len=8 cap=14) {
2022-02-02T04:33:49.128380657Z stderr F   ([2]string) (len=2 cap=2) {
2022-02-02T04:33:49.128385704Z stderr F    (string) (len=13) "IgnoreUnknown",
2022-02-02T04:33:49.128391003Z stderr F    (string) (len=1) "1"
2022-02-02T04:33:49.128396971Z stderr F   },
2022-02-02T04:33:49.128404865Z stderr F   ([2]string) (len=2 cap=2) {
2022-02-02T04:33:49.128418693Z stderr F    (string) (len=14) "K8S_POD_TENANT",
2022-02-02T04:33:49.128424014Z stderr F    (string) (len=6) "system"
2022-02-02T04:33:49.128428773Z stderr F   },
2022-02-02T04:33:49.128433903Z stderr F   ([2]string) (len=2 cap=2) {
2022-02-02T04:33:49.128438787Z stderr F    (string) (len=17) "K8S_POD_NAMESPACE",
2022-02-02T04:33:49.128443722Z stderr F    (string) (len=7) "default"
2022-02-02T04:33:49.128448358Z stderr F   },
2022-02-02T04:33:49.128453422Z stderr F   ([2]string) (len=2 cap=2) {
2022-02-02T04:33:49.12845872Z stderr F    (string) (len=12) "K8S_POD_NAME",
2022-02-02T04:33:49.128463856Z stderr F    (string) (len=9) "vmdefault"
2022-02- 02T04:33:49.128468456Z stderr F   },
2022-02-02T04:33:49.12847349Z stderr F   ([2]string) (len=2 cap=2) {
2022-02-02T04:33:49.128478116Z stderr F    (string) (len=26) "K8S_POD_INFRA_CONTAINER_ID",
2022-02-02T04:33:49.128483809Z stderr F    (string) (len=36) "9d85f320-dca1-4ca3-baff-981fde17188f"
2022-02-02T04:33:49.128488544Z stderr F   },
2022-02-02T04:33:49.128492974Z stderr F   ([2]string) (len=2 cap=2) {
2022-02-02T04:33:49.128497571Z stderr F    (string) (len=3) "VPC",
2022-02-02T04:33:49.128502123Z stderr F    (string) ""
2022-02-02T04:33:49.12850672Z stderr F   },
2022-02-02T04:33:49.128511286Z stderr F   ([2]string) (len=2 cap=2) {
2022-02-02T04:33:49.128515777Z stderr F    (string) (len=4) "NICs",
2022-02-02T04:33:49.128520371Z stderr F    (string) ""
2022-02-02T04:33:49.128525008Z stderr F   },
2022-02-02T04:33:49.128529545Z stderr F   ([2]string) (len=2 cap=2) {
2022-02-02T04:33:49.128534231Z stderr F    (string) (len=9) "K8S_ANNOT",
2022-02-02T04:33:49.128538982Z stderr F    (string) (len=17) "{\"cni\": \"calico\"}"
2022-02-02T04:33:49.128543474Z stderr F   }
2022-02-02T04:33:49.128548006Z stderr F  },
2022-02-02T04:33:49.128552479Z stderr F  CapabilityArgs: (map[string]interface {}) <nil>
2022-02-02T04:33:49.12855733Z stderr F })

2022-02-02T04:33:49.138379738Z stderr F panic: runtime error: invalid memory address or nil pointer dereference
2022-02-02T04:33:49.138395801Z stderr F [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x95007d]
2022-02-02T04:33:49.138400984Z stderr F
2022-02-02T04:33:49.138406097Z stderr F goroutine 1 [running, locked to thread]:
2022-02-02T04:33:49.13841161Z stderr F centaurusinfra.io/mizar/pkg/util/netutil.ActivateInterface(0xc00002428b, 0xc, 0xc00002a0ca, 0x33, 0xc000246130, 0xc, 0xc00024619a, 0x2, 0xc000246190, 0x9, ...)
2022-02-02T04:33:49.138428757Z stderr F         /home/ubuntu/master_mizar/pkg/util/netutil/netutil.go:57 +0x1fd
2022-02-02T04:33:49.138434347Z stderr F centaurusinfra.io/mizar/cmd/mizarcni/app.DoCmdAdd(0xfcc5a0, 0xc0000ec600, 0x3a, 0x600, 0xc0000cf8c0, 0x9b8420, 0xc000021bd0, 0x9590b1)
2022-02-02T04:33:49.138440494Z stderr F         /home/ubuntu/master_mizar/cmd/mizarcni/app/worker.go:59 +0x5e8
2022-02-02T04:33:49.138445621Z stderr F main.cmdAdd(0xc000128380, 0x0, 0x0)
2022-02-02T04:33:49.138450475Z stderr F         /home/ubuntu/master_mizar/cmd/mizarcni/mizarcni.go:55 +0x97
2022-02-02T04:33:49.138455739Z stderr F github.com/containernetworking/cni/pkg/skel.(*dispatcher).checkVersionAndCall(0xc000133e88, 0xc000128380, 0xba1140, 0xc000124de0, 0xaf4988, 0xc00000ef40)
2022-02-02T04:33:49.138460701Z stderr F         /home/ubuntu/go/pkg/mod/github.com/containernetworking/cni@v0.8.1/pkg/skel/skel.go:166 +0x30d
2022-02-02T04:33:49.138469951Z stderr F github.com/containernetworking/cni/pkg/skel.(*dispatcher).pluginMain(0xc000133e88, 0xaf4988, 0x0, 0xaf4990, 0xba1140, 0xc000124de0, 0xad2814, 0x10, 0xfcd0a0)
2022-02-02T04:33:49.138474796Z stderr F         /home/ubuntu/go/pkg/mod/github.com/containernetworking/cni@v0.8.1/pkg/skel/skel.go:218 +0x439
2022-02-02T04:33:49.138479933Z stderr F github.com/containernetworking/cni/pkg/skel.PluginMainWithError(...)
2022-02-02T04:33:49.138485013Z stderr F         /home/ubuntu/go/pkg/mod/github.com/containernetworking/cni@v0.8.1/pkg/skel/skel.go:275
2022-02-02T04:33:49.138490549Z stderr F github.com/containernetworking/cni/pkg/skel.PluginMain(0xaf4988, 0x0, 0xaf4990, 0xba1140, 0xc000124de0, 0xad2814, 0x10)
2022-02-02T04:33:49.138509372Z stderr F         /home/ubuntu/go/pkg/mod/github.com/containernetworking/cni@v0.8.1/pkg/skel/skel.go:290 +0x128
2022-02-02T04:33:49.138514771Z stderr F main.main()
2022-02-02T04:33:49.138519834Z stderr F         /home/ubuntu/master_mizar/cmd/mizarcni/mizarcni.go:86 +0x165
2022-02-02T04:33:49.139277406Z stderr F E0202 04:33:49.139189   11140 client.go:225] AddSandboxToNetwork: PodID "9d85f320-dca1-4ca3-baff-981fde17188f", PodName "vmdefault", PodTenant "default", PodNs "system", VPC "", NICs "": error: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input

What you expected to happen:

cni might raise error if there is some config it does not support or not as expected, it should not panic, i.e to handle it gracefully.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

yb01 commented 2 years ago

the parameter diff from netpod and vm pod. the only difference is the IFName. in vmpod, it is virtlet-eth0.

 mizarcni.go:57] Network variables: {"Command":"ADD","ContainerID":"9fd465eb-8d8a-4fe2-bcea-c98f82db48fc","NetNS":"/var/run/netns/9fd465eb-8d8a-4fe2-bcea-c98f82db48fc","IfName":"virtlet-eth0","CniPath":"/opt/cni/bin","K8sPodNamespace":"default","K8sPodName":"vmdefault","K8sPodTenant":"system","CniVersion":"0.3.1","NetworkName":"mizarcni","Plugin":"mizarcni"}

 mizarcni.go:57] Network variables: {"Command":"ADD","ContainerID":"d682b478b7dfdb479de6989e68e9b2bb5d5c38221d3e3590484f9957e98e5718","NetNS":"/var/run/netns/cni-9169dd3f-c78b-5abf-b35c-6a3926a5cf21","IfName":"eth0","CniPath":"/opt/cni/bin","K8sPodNamespace":"kube-system","K8sPodName":"netpod1-1","K8sPodTenant":"system","CniVersion":"0.3.1","NetworkName":"mizarcni","Plugin":"mizarcni"}
yb01 commented 2 years ago

in this code, it should at least check err and/or link object before using link


// moves the interface to the CNI netnt, rename it, set the IP address, and the gatewey.
func ActivateInterface(
    ifName string,
    netNSName string,
    vethName string,
    ipPrefix string,
    ipAddress string,
    gatewayIp string) (string, error) {

    link, err := netlink.LinkByName(vethName)
    if err == nil {
        if link.Attrs().OperState == netlink.OperUp {
            return fmt.Sprintf("Interface %s has already been UP.", vethName), nil
        }
    }