Open Vultaire opened 2 years ago
I think this would make sense, but as well as the code change, will also need a DB patch to expand any previously stored non-member specific key into multiple values, one with the same value for each member in the cluster at the time of the upgrade.
@stgraber any objections?
I think that makes sense. We'd need to make sure that it doesn't apply to OVN networks though.
Thank you for considering this feature request.
I don't want to interfere too deeply with the setting of milestones, but in case it might influence prioritization, I'd like to explain about the use case a little more, as well as the workaround that we're presently considering since we lack this feature.
We have 2 metals that we are planning to deploy containers to. Effectively, the intent is to have the second metal for disaster recovery. We would ideally like to deploy 2 units of each application to each of the metals using LXD containers, but with ingress/egress traffic targeting only one of the hosts. (That's the purpose of the SNAT - for directing traffic to one of the hosts.) In the case of one metal going down, we could update routing to use the SNAT address of the other metal to "fail over" to using it instead.
Unfortunately, since we lack this feature, we can't use LXD clustering - instead, we're deploying to a single LXD server, and using "lxc copy --refresh" to clone the containers to the second LXD server. In the case of failover, we can start up the "standby" clones of the containers on the other host. (This is after updating Juju-side credentials to allow Juju to communicate with the other LXD server, since this is Juju-managed.)
The above doesn't sound too bad, but it's quite cumbersome. Any time we deploy a new app, or update an existing app, we need to remember to "lxc copy" that app's container and the Juju controller container to the other LXD host. If this is missed, in the case of failover we may end up with out-of-date or missing applications.
...Just wanted to share that context. Would love to have this sooner than "later", but admittedly I don't know how much later "later" is; hope this may help you understand our use case better as well. Thank you for your time!
Assuming LXD had the discussed SNAT feature...
So it sounds like even with LXD clustering, you'd still need to do lxc copy --refresh
to a standby instance on the other cluster member in order to have it available if one of the LXD members went down.
BTW, in the meantime you could just set ipv4.nat=false
on the network and then add a manual MASQUERADE rule to each cluster member's firewall.
Noted on both counts. Good feedback.
I think we missed the 3 node soft-requirement for LXD clusters. It makes sense, similar to other HA services that require quorum; we simply missed that it also applied to LXD. Since we only have 2 boxes which have been provided to us, I think we will likely use our "lxc copy"-based procedure; the bug is still valid but I retract my concerns re: the milestone.
Required information
Issue description
I'm unable to specify ipv4.nat.address values on a per-cluster-node basis. Our intent is to use this for routing purposes; we're trying to use SNAT to route in/out of the LXD host to the rest of the network, and not having this interferes with our ability to route traffic to/from the correct LXD host.
Steps to reproduce
Attempt to create an LXD network targeting one of the nodes with a specified ipv4.nat.address, e.g.
lxc network create lxdbr1 ipv4.nat.address=10.29.120.10 --target lxd1
.The above fails with the following message:
Error: Config key "ipv4.nat.address" may not be used as node-specific key