contiv / netplugin

Container networking for various use cases
Apache License 2.0
515 stars 178 forks source link

TestTwoHostsMultipleTenants_regress fails #39

Closed mapuri closed 9 years ago

mapuri commented 9 years ago

@jainvipin analysis:

This started showing up recently after your changes, but the main symptom was exhibited earlier too due to some fluke it was working (i.e. pings were criss-crossing i.e. without MAC verification we don’t know who was pinging who) The problem in this case is thate FreeLocalVlan allocated for two tenants is same, because LocalVlanBitset is per tenant, it should have been either global (across all tenants) or per host (still unique across all tenants).

possible solution proposed by @mapuri:

Right now I derive LocalVlan bitmap for a tenant by looking at vlan configuration for just that tenant, I think taking loclao-vlan and vlans for other tenants into consideration shall fix this. I think the LocalVlanBitset can’t be per host in a mix vlan-vxlan environment which we support, by default.

Failure logs from the test: +++++ === RUN TestTwoHostsMultipleTenants_regress 2015/04/03 19:13:34 ovs-vsctl on node netplugin-node2: 2bfd5ce9-a922-48b3-9a76-65e929639b9d Manager "ptcp:6640" is_connected: true Bridge contivBridge Port "port4" tag: 1 Interface "port4" type: internal Port "vxifpurple192168210" tag: 1 Interface "vxifpurple192168210" type: vxlan options: {key="15001", remote_ip="192.168.2.10"} Port "vxiforange192168210" tag: 1 Interface "vxiforange192168210" type: vxlan options: {key="10001", remote_ip="192.168.2.10"} Port "port2" tag: 1 Interface "port2" type: internal ovs_version: "2.0.2"

--- FAIL: TestTwoHostsMultipleTenants_regress (75.17s) docker.go:51: Error 'exit status 1' launching container 'myContainer3', Output: PING 11.1.0.1 (11.1.0.1) 56(84) bytes of data. From 11.1.0.2 icmp_seq=2 Destination Host Unreachable From 11.1.0.2 icmp_seq=3 Destination Host Unreachable From 11.1.0.2 icmp_seq=4 Destination Host Unreachable

            --- 11.1.0.1 ping statistics ---
            5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 4002ms
            pipe 4
jainvipin commented 9 years ago

@mapuri

I think the LocalVlanBitset can’t be per host in a mix vlan-vxlan environment which we support

FreeLocalVlans are the set of vlans not used by any externally visible/available vlans, even in a mix vlan-vxlan environment. Therefore, managing them per host would be a right thing to do, instead of globally and yet per host.

mapuri commented 9 years ago

So in a a mix environment where say I have a vlan network for vlans 200-300 and then some vxlan networks, are you saying we can use those tags as LocalVlans on certain hosts? I thought we couldn't because usually each network can potentially reside on all hosts.

jainvipin commented 9 years ago

No we shouldn't reuse vlans 200-300 that are externally visible - let's not complicate things. My point was that these would be globally managed/reserved. However, the remaining ones can be managed locally per host because they are not expected to be showing up on the wire. And thus can be repurposed for different networks on different hosts.

mapuri commented 9 years ago

However, the remaining ones can be managed locally per host because they are not expected to be showing up on the wire. And thus can be repurposed for different networks on different hosts.

ok, I see. It does seems like an alternative but I am not sure if local management would not entail a global view. Especially in an environment where the vlan and vxlan networks co-exist.

Let's say I have following networks and there are two hosts:

Network 1 -> vlan network with tags 1-1000 Network 2 -> vxlan network with tags 20000-30000 Network 3 -> vxlan network with tags 30000-40000

vlan tags 1-1000 can be globally managed i.e. they mean the same on both the hosts. Looks like we are agreeing on this.

For network 2 and 3, the local-vlans need to be derived. A global algorithm that works is that network 2 gets local-vlans 1001-2000 and network 3 get's local-vlans as 2001-3000 across the hosts i.e. they are globally managed across hosts. This is simple to implement since this decision can be made at a central entity itself.

But let's say we were to manage these locally at best the host level allocation could look different i.e. host1 might give vlan tags 1001-2000 for network 2 and tags 2001-3000 for network 3 while host2 might decided to give tags 2001-3000 for network 2 and tags 1001-2000 for network 3. At the end each host still can't save or optimize much as long as the networks are global, right? Also, doing it locally entails keeping track (consistent state) of global vlans per host which will probably again require being solved in some central way.

jainvipin commented 9 years ago

@mapuri - in the above example, we'd be limited to 4k vxlans globally. Assuming we want to support more vxlans, a local management would not make host level limitation a cluster-wide limitation.

mapuri commented 9 years ago

I think the 4K limitation is independent of the choice of where the allocation happens (globally or locally). The limitation is due to the fact that we expect all networks to exist on all hosts (and we have a vlan based approach, which are 4K in number), requiring us to reserve resources on all of them.

To support more than 4K vxlans, with a vlan based approach, we can scale by having the knowledge about where those networks are placed i.e. we can't assume vxlan networks to exist on all hosts. But that changes the configuration model (i.e. introduce some sort of host to network binding), not sure if that is what we want. Or alternatively we can implement vxlans using no-vlan approaches, in which case we don't need to reserve vlans at all.

Is that correct? Is there another way of allocation that you have in mind, may be take an example?

jainvipin commented 9 years ago

4k limit is independent of global or local allocation. The allocation would need to be on per host basis, either globally or locally. Since it is happening per host level, why maintain all the state globally and track the host level resources globally.

Of course, if we organize the allocation on per host level, we can move to breaking the assumption taht all vxlans segments are present on all hosts...

If your changes are fixing the current sanity issue, we can keep this issue open while you check in your changes.

jainvipin commented 9 years ago

Is there another way of allocation that you have in mind

The other way is to manage individual flows in which we don't assign vlan-tags to segments. Alternatively we can modify the ovs code to support larger pools of virtual bridges. Let's bring up scale test bed to know more, where the limitations are.

mapuri commented 9 years ago

Since it is happening per host level, why maintain all the state globally and track the host level resources globally.

As I explained even if we do allocation at host level, we don't get any benefit rather we still need to have a consistent global view locally which brings the problem back to doing it centrally/globally.

If your changes are fixing the current sanity issue, we can keep this issue open while you check in your changes.

My current changes assume a vlan based scheme for vxlan networks, with vxlan networks being global as we have today. Are you asking to keep the issue open after committing my changes or shall we work on alternate approaches for vxlan networks to fix this issue?

jainvipin commented 9 years ago

My current changes assume a vlan based scheme for vxlan networks, with vxlan networks being global as we have today. Are you asking to keep the issue open after committing my changes or shall we work on alternate approaches for vxlan networks to fix this issue?

Your fix is inline with the current thinking of vxlans spread on all hosts in a cluster. We should fix that. I want to be able to track the issue where we don't assume that VXLAN networks are present ominously. If we open another issue for that and close this againts your changes - that's would be fine too.

mapuri commented 9 years ago

ok, filed issue #42 to support more than 4K vxlans.

Will close this issue with fixes from PR #40