Closed mcastelino closed 6 years ago
Directly running this plugin via CNI scripts shows
sudo CNI_PATH=$CNI_PATH ./priv-net-run.sh bash
{
"cniVersion": "0.2.0",
"ip4": {
"ip": "10.248.246.146/28",
"gateway": "10.248.246.145",
"routes": [
{
"dst": "0.0.0.0/0",
"gw": "10.248.246.145"
}
]
},
"dns": {}
}
{
"dns": {}
}
# ip route
default via 10.248.246.145 dev eth0
10.248.246.144/28 via 10.248.246.145 dev eth0 src 10.248.246.146
10.248.246.145 dev eth0 scope link src 10.248.246.146
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
3: eth0@if26: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 0a:15:5d:ec:63:c1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.248.246.146/28 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::815:5dff:feec:63c1/64 scope link
valid_lft forever preferred_lft forever
The routes send into the agent via hyperstart are
cc-runtime[3530]: time="2017-11-30T03:15:07Z" level=warning msg="route list" routes="[{ 10.248.246.145 eth0} {10.248.246.144/28 10.248.246.145 eth0} {10.248.246.145/32 eth0}]" source=virtcontainers subsystem=hyper
which indicates a potential bug in the agent
/cc @amshinde this is a bug in the agent due to how we detect existing routes https://github.com/clearcontainers/agent/blob/master/network.go#L196
Here we check only for the destination. We should check destination and gateway For example take this case
ip link add dummy0 type dummy
ip addr add 19.248.245.144/28 dev dummy0
ip link set up dummy0
the
ip route
will show
19.248.245.144/28 dev dummy0 proto kernel scope link src 19.248.245.144
Which is the interface route, which in the case of PTP is over-ridden by the more specific gateway route.
In our case we need to wipe the interface routes and setup all the routes afresh to ensure we handle these cases.
/cc @sameo @WeiZhang555 @gnawux @sboeuf this is an example of the case where we need the update routes command as the add/delete will not be able to handle this case. Unless we make assumptions about what happens within the VM.
Unless we make assumptions about what happens within the VM.
@mcastelino I see your points. So we can't make any assumptions for interface routes inside VM, instead you want an UpdateRoute
interface to wipe all route rules out inside VM and pass a whole set of new customized route rules as final route table, is that right?
If so, the function interface we made in https://github.com/kata-containers/agent/pull/4 is wrong, we should remove AddRoute
/RemoveRoute
interface, and add the UpdateRoute
which accepts an array of route rules instead.
@WeiZhang555 yes. that would be ideal. This makes the route setup declarative vs imperative. In general we should use declarative API's vs imperative @sameo @bergwolf what do you think. This means we should look at our gRPC implementation where we can go to declarative APIs as far as possible.
So for example in the case of interface, all of its properties should be done in one shot. So all IPs should be presented vs Add/Remove IP. So we should avoid imperative APIs.
@amshinde the agent issue https://github.com/clearcontainers/agent/issues/175
@mcastelino Following your though, the UpdateInterface
function should be removed too, because as "Add/Remove IP", "SetName" and "SetMTU" can also be done in one shot.
Let me just talk about Add/Remove IP part. Actually when we are using "runv interface" command, the "Add/Remove IP" had not been used, so for me, remove these are fine. I was hoping to expose more detailed APIs to user and let upper level to do more flexible works.
In clearcontainer/runv CNM/CNI support solution(they are similar: https://github.com/clearcontainers/runtime/blob/master/docs/architecture/arch-images/CNM_detailed_diagram.png), user was doing network setup step by step,
for example, a CNI bridge plugin (https://github.com/containernetworking/plugins/blob/master/plugins/main/bridge/bridge.go#L325) 1. will setup bridge, 2. create veth, 3. add ip one by one, in such usage, we can't know when the interface is finished setting, and we have to use UpdateInterface
to add configurations to interface one by one. That's why the Update
function is needed though RemoveIP
isn't been used till now.
A declarative API sounds good, it's very clear and easy to use. but it seals details of interface/route operations and makes agent interface less flexible. And often, convenience is conflict with flexibility, just as declarative APIs vs imperative APIs.
So my another thought is, instead of UpdateRoute(<a rule list>)
, another choice is adding query
function.
With Add
/Remove
/Update
/Query
functions, everything can be done, but maybe we go too far in another direction...
BTW, we (Huawei) are using runv interface/route
command for CNI plugins, it's direct and very clear than implicit "netns and nslistener" stuff, we only need to write a customized network plugin for vm-based-container adaption, which is not so hard.
@WeiZhang555 one comment about UpdateInterface()
, we really need this one since we need to update the interface created by the VM, referring to the tap interface. We rely on the hardware address of the tap interface created in order to find the interface inside the VM because we don't know about the name, and then we update it.
@sboeuf Using Mac address
as identifier in UpdateInterface
is fine for me :-)
This issue is only relevant for Clear Containers when using cc-agent. This is fixed for kata-agent.
Closing this, as I believe the original issue is no longer observable. If it is, I'd recommend marking as "will not fix" given our focus on Kata. Please reopen if you disagree.
When using the CNI PTP plugin the routes are not setup correct.
With a CNI configuration as follows
With runc container the route table is as follows
With cc-runtime the routes are as follows
The gateway information is missing in the case of cc-runtime version
This can be reproduced by using crioctl standalone.