apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
1.98k stars 1.09k forks source link

cloud0 subnet should not be hardcoded to 169.254.0.1/16 #3488

Closed wido closed 5 years ago

wido commented 5 years ago

We are setting up a BGP+EVPN+VXLAN setup using Frr and BGP Unnumbered and this is causing some problems with the cloud0 bridge created by the cloudstack agent.

Although the global setting control.cidr can be modified, the KVM Agent will still create this bridge with a hardcoded subnet:

com/cloud/hypervisor/kvm/resource/BridgeVifDriver.java

        if (!foundLinkLocalBr) {
            Script.runSimpleBashScript("ip address add 169.254.0.1/16 dev " + linkLocalBr + ";" + "ip route add " + NetUtils.getLinkLocalCIDR() + " dev " + linkLocalBr + " src " +
                    NetUtils.getLinkLocalGateway());
        }

When using BGP Unnumbered it will try to create a route pointing to 169.254.0.1

This works until the CloudStack Agent is started:

10.255.255.1 via 169.254.0.1 dev enp81s0f1 proto bgp metric 20 onlink 
10.255.255.2 via 169.254.0.1 dev enp81s0f1 proto bgp metric 20 onlink 
10.255.255.3 via 169.254.0.1 dev enp81s0f1 proto bgp metric 20 onlink 
10.255.255.4 via 169.254.0.1 dev enp81s0f1 proto bgp metric 20 onlink 
10.255.255.6 via 169.254.0.1 dev enp81s0f1 proto bgp metric 20 onlink 
10.255.255.7 via 169.254.0.1 dev enp81s0f1 proto bgp metric 20 onlink 
10.255.255.8 via 169.254.0.1 dev enp81s0f1 proto bgp metric 20 onlink 

After the CloudStack Agent is started the 169.254.0.1/16 is added to cloud0 and not allowing Frr to create these routes:

zebra[5010]: 0:10.255.255.8/32: Route install failed
zebra[4562]: Extended Error: Nexthop has invalid gateway

The solution would be that through agent.properties this CIDR can be controlled and isn't hardcoded.

rohityadavcloud commented 5 years ago

Sounds like a config issue, @wido can you ensure that BGP is not automatically enforced for cloud0. I think from KVM agent perspective it is not aware what kind of network topology/model exists, it simply works on the idea of a network device.

wido commented 5 years ago

@rhtyd It isn't. The 169.254.0.1 address is reserved for BGP unnumbered configurations.

As CloudStack uses this subnet as a hardcoded configuration it conflicts with it.

rohityadavcloud commented 5 years ago

Aah okay it makes sense now @wido - what workaround do you propose? For example, one good solution could be to get rid of use of link-local based addresses/nics. VR/systemvm programming can be done via an IP on the mgmt/private network cidr.

Our Vmware implementation does not use link-local as well, and all communications are done directly to the private IP in the private address range (typically in RFC1918). This change is do-able but may cause issues for users who don't have enough free address/ips in the pod/range for/in private/mgmt network. What do you think?

rohityadavcloud commented 5 years ago

cc @PaulAngus @andrijapanic @borisstoyanov @anuragaw @shwstppr @rafaelweingartner @DaanHoogland @fmaximus @ustcweizhou @GabrielBrascher @nvazquez @svenvogel @NuxRo and others - what do you think of getting rid of link-local IP based nics/programming/communication for VRs?

ustcweizhou commented 5 years ago

@rhtyd it seems to be a big change. can we look for a workaround for @wido at first ?

NuxRo commented 5 years ago

Personally I don't care for the link local, however I imagine there are folks who do and have tooling relying on it etc. Like Wei, I'd say this is a big change so let's look at a workaround for Wido, a configurable variable in the properties could work.

GabrielBrascher commented 5 years ago

@rhtyd I am +1 on implementing a workaround. I will be happy to discuss and help designing other approaches, but I think that we should first invest some effort on the workaround, at least for now.

We already have extended CloudStack KVM agent with a hotfix for a 4.12.0.0 environment, implemented by @wido. It looks good on the KVM agent side. It was added a few parameters for the agent.properties. If they are not configured, ACS keeps using de current default values.

network.linklocal.cidr=169.254.0.0/16
network.linklocal.address=169.254.0.1/16
network.linklocal.gateway=169.254.0.1
network.linklocal.netmask=255.255.0.0

Unfortunately, we still need some work on the CloudStack management side, otherwise, it still managing system vms with the 169.254.0.0/16 hardcoded network.

It is worth mentioning that we configured the global settings control.gateway and control.cidr before configuring the new zone; however, the Control network kept with the "hardcoded" setup.

rohityadavcloud commented 5 years ago

@wido @GabrielBrascher are you sending a PR soon? The 4.13 freeze is coming soon ina week's time, given you may have explored or applied a workaround in your env and that the specific use case may not affect most users can we revisit this in either 4.13.1.0 or 4.14? If there is no hurry perhaps a longer term proper fix to remove link-local nic with private/mgmt nic may be explored. I'll remove the 4.13.0.0 milestone, but feel free to add it if a PR could be sent.

wido commented 5 years ago

We are working on a PR and hope to submit it today

ustcweizhou commented 5 years ago

@wido @GabrielBrascher If all kvm agents use the same setting, I would suggest you to use same way as described in 714221234d41920ccb131367cca000cd4da7b261 so when we change the global setting the new value will be propagated to all kvm agents when they connect. Just a suggestion.

network.linklocal.cidr=169.254.0.0/16
network.linklocal.address=169.254.0.1/16
network.linklocal.gateway=169.254.0.1
network.linklocal.netmask=255.255.0.0

only need one of address and gateway, and one of cidr and netmask

wido commented 5 years ago

@ustcweizhou Thanks for the suggestion! I was looking for a way to send the global setting control.cidr to the KVM Agent.

I'll look into this.

ustcweizhou commented 5 years ago

@wido I have created pull request for our another change. #3491