Closed nuwang closed 6 years ago
Another solution may be to connect all external networks to each subnet. This means that a router has to be created for each external network and attached to the subnet.
To be clear, by move under the internet gateway, you mean provider.networking.gateways.<gateway>.floating_ips.*
?
Could we instead require a network when creating a floating IP and ignore it with the other providers (no other provider requires the network)?
I'm not sure we can support a network parameter consistently across providers. That's because the only valid networks that openstack will accept are external networks.
Sure but we have the .external
property on the networks and it would need to be clear in the docs that the supplied param needs to be an external network. We can do a check and raise an exception of that's not the case.
Right, we could probably do that. However, I'm not sure that's desirable - the main reason being - that's another concept to learn, filter by etc whereas we've already put in some effort to hide the "external" property because it's an openstack only concept. I think it would be great if we could structurally enable the right behaviour, so that when you do create a floating ip, you'd be naturally inclined to attach it via the correct network. Since an openstack gateway == external network, that almost seems like the right place. I guess we should have have a more in-depth design discussion to weigh the pros and cons.
Spent a good chunk of time on this and best way for me to reason about it was to run through a typical usage scenario (basically, the one documented here: http://cloudbridge.cloudve.org/en/latest/topics/networking.html (under step 2)
Create network netX
Create subnet snX
Create an instance within snX (netX)
Create a router for netX
Attach snX to the router
Get an inet gtw for netX
Attach router to gtw
Create a floating IP
To have this work consistently on OS, we’d need to parameterize step 6 (inet gtw) with an external network and use that same network in step 8 (fip). However, this external network is different from the private netX initially created and used on AWS across the board so it would require the user to be aware of this difference and take the additional step, which introduces cloud-specific code.
After much debating, the conclusion is to stick with the initial suggestion and move floating IPs under the internet gateway. It conceptually does make sense but requires the user to become aware of an additional concept, which is a drawback.
Given we're now parameterizing gateways
with a network
, would it make sense to move gateways
(and hence FIPs
) under network
? This came up as I tried to implement this new structure in the CloudLaunch API as getting a gateway
(or a FIP) now needs a network. WIP branch available here https://github.com/CloudVE/djcloudbridge/tree/networking
I can't quite remember why we need to parameterize the internet gateway with the network again. Is it to help with attaching to the router? However, if we do need to do this, I see what you mean in terms of it being necessary to implement the cloudlaunch endpoint. In which case, we probably should?
I believe the reason we decided to parameterize the gateway with a network was because of OpenStack so that when a floating IP is being created, it is created on the same network as the one the gateway is connected to, which is required with OpenStack. Without parameterizing it, the user would need to make sure the two networks are the same. The initial implementation assumed there is only one external network and used it but we discovered that doesn't work on NeCTAR in particular. The following is the usage pattern:
net = networking.networks.create(...)
subnet = net.create(...)
vm = instances.create(subnet, ...)
router = networking.routers.create(net, ...)
router.attach_subnet(subnet)
gtw = networking.gateways.get_or_create(net, ...)
fip = gtw.create()
vm.add_floating_ip(fip)
This now ensures the gtw
is attached to the same as the network fip
is being created under. If we don't supply a network to the gateway, the logic needs to infer automatically which network to use.
Having gone though this now, it feels like we can omit the network as long as the FIP is nested under a gateway so it can match it. This will require an explicit attaching of a router to a network, which we initially used, so the logic would become:
...
gtw = networking.gateways.get_or_create()
router.attach_gateway(gtw) # This step is new from above as it was automatic there
fip = gtw.create()
vm.add_floating_ip(fip)
Would this create issues down the line though, for scenarios where a FIP/gateway wants to be reused (vs. being created and used right away)? Something like this:
subnet = get_subnet...
vm = create(subnet, ...)
gtw = networking.gateways.get_or_create()
fip = gtw.floating_ips.list()[0]
vm.add_floating_ip(fip)
Wouldn't this cause an issue if the returned gateway is not associated with the same network as the subnet
used to launch the VM?
One problem that arrises if we nest gateways under the network is how do we retrieve gateways that are not attached to a network? Subnets, as an example of a resource that exists under the network, require (at the provider level) to be parameterized/created under a network. Gateways do not so we could have orphan gateways that we cannot access.
After a bit more discussion, we realized why the network
parameter is necessary for the gateway and it's because of AWS. Without it, the notion of get_or_create_inet_gateway
had no way to 'discover' an existing internet gateway that's properly connected. So short of picking a random one, we need the network to be able to filter the internet gateways for it vs. always creating a new one (as we did initially). Creating a new one each time was not sustainable given a long running scenario (e.g., launch an instance via CloudLaunch and keep it alive for days) would not have a way to get back the launch context and cleanup (e.g., the instance is manually deleted vs. via CloudLaunch).
The conclusion is to nest the gateways
under network
. This will have the side effect of not showing gateways that are not connected to a network (for AWS) but at least CloudBridge's implementation will attach the gateway to a network as soon as a gateway is created so only externally-created gateways that are not attached will be omitted. In the future, if deemed desirable, we can also add a gateways
property to networking
that would list all gateways irrespective of a network.
Related issue reported on launchpad in: https://bugs.launchpad.net/neutron/+bug/1743480
A workaround for the OpenStack bug above have been made in: https://github.com/gvlproject/cloudbridge/commit/7688d283fd401857fb7449c7dadc118a19d915aa and https://github.com/gvlproject/cloudbridge/commit/879117a2a123e79623e26e7da2833c806e7381fb
I think this issue can be closed now.
It looks like our assumption of taking the first available external network and connecting routers to that network does not work in NeCTAR. This is because, when floating ips are created, they are associated with a specific external network, as shown here:
The floating_network_id appears to be the id of the external network. Therefore, if the external network of the router, and the external network of the floating ip do not match, the launch fails with the following error:
It looks like maybe the solution is to move floating ips under the internet gateway?