containers / virtcontainers

A Go package for building hardware virtualized container runtimes
Apache License 2.0
139 stars 43 forks source link

Route table not setup properly for PTP CNI plugin #500

Closed mcastelino closed 6 years ago

mcastelino commented 6 years ago

When using the CNI PTP plugin the routes are not setup correct.

With a CNI configuration as follows

{
  "name": "mynet",
  "type": "ptp",
  "ipam": {
    "type": "host-local",
    "subnet": "10.248.246.144/28",
    "routes": [
      { "dst": "0.0.0.0/0" }
    ]
  }
}

With runc container the route table is as follows

default via 10.248.246.145 dev eth0
10.248.246.144/28 via 10.248.246.145 dev eth0  src 10.248.246.146
10.248.246.145 dev eth0  src 10.248.246.146

With cc-runtime the routes are as follows

default via 10.248.246.145 dev eth0
10.248.246.144/28 dev eth0  src 10.248.246.146
10.248.246.145 dev eth0

The gateway information is missing in the case of cc-runtime version

cc-runtime  : 3.0.8
   commit   : 9c69a8a
   OCI specs: 1.0.0-dev

This can be reproduced by using crioctl standalone.

mcastelino commented 6 years ago

Directly running this plugin via CNI scripts shows

sudo CNI_PATH=$CNI_PATH ./priv-net-run.sh bash
{
  "cniVersion": "0.2.0",
  "ip4": {
    "ip": "10.248.246.146/28",
    "gateway": "10.248.246.145",
    "routes": [
      {
        "dst": "0.0.0.0/0",
        "gw": "10.248.246.145"
      }
    ]
  },
  "dns": {}
}
{
  "dns": {}
}

# ip route
default via 10.248.246.145 dev eth0
10.248.246.144/28 via 10.248.246.145 dev eth0  src 10.248.246.146
10.248.246.145 dev eth0  scope link  src 10.248.246.146

# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if26: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 0a:15:5d:ec:63:c1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.248.246.146/28 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::815:5dff:feec:63c1/64 scope link
       valid_lft forever preferred_lft forever
mcastelino commented 6 years ago

The routes send into the agent via hyperstart are

cc-runtime[3530]: time="2017-11-30T03:15:07Z" level=warning msg="route list" routes="[{ 10.248.246.145 eth0} {10.248.246.144/28 10.248.246.145 eth0} {10.248.246.145/32  eth0}]" source=virtcontainers subsystem=hyper

which indicates a potential bug in the agent

mcastelino commented 6 years ago

/cc @amshinde this is a bug in the agent due to how we detect existing routes https://github.com/clearcontainers/agent/blob/master/network.go#L196

Here we check only for the destination. We should check destination and gateway For example take this case

ip link add dummy0 type dummy
ip addr add 19.248.245.144/28 dev dummy0
ip link set up dummy0

the

ip route

will show

19.248.245.144/28 dev dummy0  proto kernel  scope link  src 19.248.245.144

Which is the interface route, which in the case of PTP is over-ridden by the more specific gateway route.

In our case we need to wipe the interface routes and setup all the routes afresh to ensure we handle these cases.

mcastelino commented 6 years ago

/cc @sameo @WeiZhang555 @gnawux @sboeuf this is an example of the case where we need the update routes command as the add/delete will not be able to handle this case. Unless we make assumptions about what happens within the VM.

https://github.com/kata-containers/agent/pull/4

WeiZhang555 commented 6 years ago

Unless we make assumptions about what happens within the VM.

@mcastelino I see your points. So we can't make any assumptions for interface routes inside VM, instead you want an UpdateRoute interface to wipe all route rules out inside VM and pass a whole set of new customized route rules as final route table, is that right?

WeiZhang555 commented 6 years ago

If so, the function interface we made in https://github.com/kata-containers/agent/pull/4 is wrong, we should remove AddRoute/RemoveRoute interface, and add the UpdateRoute which accepts an array of route rules instead.

mcastelino commented 6 years ago

@WeiZhang555 yes. that would be ideal. This makes the route setup declarative vs imperative. In general we should use declarative API's vs imperative @sameo @bergwolf what do you think. This means we should look at our gRPC implementation where we can go to declarative APIs as far as possible.

So for example in the case of interface, all of its properties should be done in one shot. So all IPs should be presented vs Add/Remove IP. So we should avoid imperative APIs.

mcastelino commented 6 years ago

@amshinde the agent issue https://github.com/clearcontainers/agent/issues/175

WeiZhang555 commented 6 years ago

@mcastelino Following your though, the UpdateInterface function should be removed too, because as "Add/Remove IP", "SetName" and "SetMTU" can also be done in one shot.

Let me just talk about Add/Remove IP part. Actually when we are using "runv interface" command, the "Add/Remove IP" had not been used, so for me, remove these are fine. I was hoping to expose more detailed APIs to user and let upper level to do more flexible works.

In clearcontainer/runv CNM/CNI support solution(they are similar: https://github.com/clearcontainers/runtime/blob/master/docs/architecture/arch-images/CNM_detailed_diagram.png), user was doing network setup step by step, for example, a CNI bridge plugin (https://github.com/containernetworking/plugins/blob/master/plugins/main/bridge/bridge.go#L325) 1. will setup bridge, 2. create veth, 3. add ip one by one, in such usage, we can't know when the interface is finished setting, and we have to use UpdateInterface to add configurations to interface one by one. That's why the Update function is needed though RemoveIP isn't been used till now.

A declarative API sounds good, it's very clear and easy to use. but it seals details of interface/route operations and makes agent interface less flexible. And often, convenience is conflict with flexibility, just as declarative APIs vs imperative APIs.

WeiZhang555 commented 6 years ago

So my another thought is, instead of UpdateRoute(<a rule list>), another choice is adding query function.

With Add/Remove/Update/Query functions, everything can be done, but maybe we go too far in another direction...

BTW, we (Huawei) are using runv interface/route command for CNI plugins, it's direct and very clear than implicit "netns and nslistener" stuff, we only need to write a customized network plugin for vm-based-container adaption, which is not so hard.

sboeuf commented 6 years ago

@WeiZhang555 one comment about UpdateInterface(), we really need this one since we need to update the interface created by the VM, referring to the tap interface. We rely on the hardware address of the tap interface created in order to find the interface inside the VM because we don't know about the name, and then we update it.

WeiZhang555 commented 6 years ago

@sboeuf Using Mac address as identifier in UpdateInterface is fine for me :-)

egernst commented 6 years ago

This issue is only relevant for Clear Containers when using cc-agent. This is fixed for kata-agent.

Closing this, as I believe the original issue is no longer observable. If it is, I'd recommend marking as "will not fix" given our focus on Kata. Please reopen if you disagree.