canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

Allow setting ipv4.nat.address as a node-specific key when using LXD clustering #10645

Open Vultaire opened 2 years ago

Vultaire commented 2 years ago

Required information

Issue description

I'm unable to specify ipv4.nat.address values on a per-cluster-node basis. Our intent is to use this for routing purposes; we're trying to use SNAT to route in/out of the LXD host to the rest of the network, and not having this interferes with our ability to route traffic to/from the correct LXD host.

Steps to reproduce

  1. Create an LXD cluster
  2. Attempt to create an LXD network targeting one of the nodes with a specified ipv4.nat.address, e.g. lxc network create lxdbr1 ipv4.nat.address=10.29.120.10 --target lxd1.

    The above fails with the following message: Error: Config key "ipv4.nat.address" may not be used as node-specific key

tomponline commented 2 years ago

I think this would make sense, but as well as the code change, will also need a DB patch to expand any previously stored non-member specific key into multiple values, one with the same value for each member in the cluster at the time of the upgrade.

@stgraber any objections?

stgraber commented 2 years ago

I think that makes sense. We'd need to make sure that it doesn't apply to OVN networks though.

Vultaire commented 2 years ago

Thank you for considering this feature request.

I don't want to interfere too deeply with the setting of milestones, but in case it might influence prioritization, I'd like to explain about the use case a little more, as well as the workaround that we're presently considering since we lack this feature.

We have 2 metals that we are planning to deploy containers to. Effectively, the intent is to have the second metal for disaster recovery. We would ideally like to deploy 2 units of each application to each of the metals using LXD containers, but with ingress/egress traffic targeting only one of the hosts. (That's the purpose of the SNAT - for directing traffic to one of the hosts.) In the case of one metal going down, we could update routing to use the SNAT address of the other metal to "fail over" to using it instead.

Unfortunately, since we lack this feature, we can't use LXD clustering - instead, we're deploying to a single LXD server, and using "lxc copy --refresh" to clone the containers to the second LXD server. In the case of failover, we can start up the "standby" clones of the containers on the other host. (This is after updating Juju-side credentials to allow Juju to communicate with the other LXD server, since this is Juju-managed.)

The above doesn't sound too bad, but it's quite cumbersome. Any time we deploy a new app, or update an existing app, we need to remember to "lxc copy" that app's container and the Juju controller container to the other LXD host. If this is missed, in the case of failover we may end up with out-of-date or missing applications.

...Just wanted to share that context. Would love to have this sooner than "later", but admittedly I don't know how much later "later" is; hope this may help you understand our use case better as well. Thank you for your time!

tomponline commented 2 years ago

Assuming LXD had the discussed SNAT feature...

  1. LXD clusters require at least 3 members for HA (you can run a 2 member cluster but if one goes down the other becomes inoperable).
  2. LXD clusters only replicate the instance configuration, not the actual instance rootfs itself, so unless you're using a distributed filesystem like ceph, if you lose one of the LXD cluster members, the instances on that member will not be operational.

So it sounds like even with LXD clustering, you'd still need to do lxc copy --refresh to a standby instance on the other cluster member in order to have it available if one of the LXD members went down.

tomponline commented 2 years ago

BTW, in the meantime you could just set ipv4.nat=false on the network and then add a manual MASQUERADE rule to each cluster member's firewall.

Vultaire commented 2 years ago

Noted on both counts. Good feedback.

I think we missed the 3 node soft-requirement for LXD clusters. It makes sense, similar to other HA services that require quorum; we simply missed that it also applied to LXD. Since we only have 2 boxes which have been provided to us, I think we will likely use our "lxc copy"-based procedure; the bug is still valid but I retract my concerns re: the milestone.