containernetworking / cni

Container Network Interface - networking for Linux containers
https://cni.dev
Apache License 2.0
5.54k stars 1.08k forks source link

AddNetworkList will cause something leak #797

Open hongli-my opened 3 years ago

hongli-my commented 3 years ago

https://github.com/containernetworking/cni/blob/master/libcni/api.go#L428

func (c *CNIConfig) AddNetworkList(ctx context.Context, list *NetworkConfigList, rt *RuntimeConf) (types.Result, error) {
    var err error
    var result types.Result
    for _, net := range list.Plugins {
        result, err = c.addNetwork(ctx, list.Name, list.CNIVersion, net, result, rt)
        if err != nil {
            return nil, err
        }
    }

    if err = c.cacheAdd(result, list.Bytes, list.Name, rt); err != nil {
        return nil, fmt.Errorf("failed to set network %q cached result: %v", list.Name, err)
    }

    return result, nil
}

for .conflist cni conf like: if flannel cni run success, but tunning failed, will not be cleaned for flannel.

{
  "cniVersion": "0.3.1",
  "name": "flannel",
  "plugins": [
  {
    "cniVersion": "0.3.1",
    "name": "flannel",
    "type": "flannel",
    "subnetFile": "/run/flannel/subnet.env",
    "delegate": {
      "bridge": "cbr0",
      "ipMasq": false,
      "mtu": 1450,
      "isDefaultGateway": true
    }
  },
  {
    "cniVersion": "0.3.1",
    "name": "tuning",
    "type": "tuning",
    "ethtoolConf": {
      "ifName": "eth0",
      "offloads": {
        "tso": false
      }
    }
  }
  ]
}
bboreham commented 3 years ago

I believe this is covered in the spec:

If an ADD action fails, when the runtime decides to handle the failure it should execute the DEL action (in reverse order from the ADD as specified above) for all plugins in the list, even if some were not called during the ADD action.

squeed commented 3 years ago

We're considering changing the specification (and libcni) to

squeed commented 3 years ago

This is definitely not a spec change, so removing the milestone.

Because libcni itself might crash during ADD (or the subsequent DEL), the runtime (e.g. podman, containerd) should ALWAYS call DEL on a failed ADD. So, while we might paper this over in libcni, the behavior doesn't change: you should always DEL.

bboreham commented 3 years ago

Do you mean crash as in "panic", or some more general concept of "really bad error where we can't recover enough to call DEL, but haven't actually killed the process" ?