[Feature Request]: VMs in a HA-Cluster

Sombeyyy commented 2 months ago

Is your feature request related to a problem? Please describe. Currently, a fixed node must be selected for each resource. In an HA cluster with load balancing, a VM can be moved back and forth between all nodes. If you execute "terraform apply" and one or more VMs were automatically moved to different nodes, the provider will move the VMs back to their defined node or will be recreate the VM on the defined node.

Describe the solution you'd like It would be a good idea to implement a node_names variable in addition to the node_name variable, which are mutually exclusive. Any node on which the VM can be located can be listed via this variable. So if HA is activated for a VM and it is located on a different node than specified, it is treated as it exists already and will not be migrated back or recreated. If a VM does not yet exist, it is created by a random mechanism on one of the specified nodes.

Additional context An example configuration could look like this:

resource "proxmox_virtual_environment_vm" "ubuntu_vm" {
  name = "terraform-provider-proxmomx-ubuntu-vm"
  node_names = ["first-node", "second-node", "third-node"]
  vm_id = 4321

  disk {
    datastore_id = "local-lvm"
    interface = "scsi0"
    disk_size = 32
  }
  network_device {
    bridge = "vmbr0"
  }
  operating_system {
    type = "l26"
  }
}

bitchecker commented 1 month ago

This will be very useful also if you enable some "cluster wide" load balancer!

elias314 commented 1 month ago

I think using ignore_changes handles this pretty well:

lifecycle {
  ignore_changes = [ node_name ]
}

Would this be sufficient?

bitchecker commented 1 month ago

Good point... @bpg can this be helpful?

Sombeyyy commented 1 month ago

@elias314

In my opinion, this would be a possible solution for existing servers. However, it would also be good to define for new servers that it does not matter on which node they are deployed. This way, it is not necessary to explicitly state which node should be used. An example: The specified node for the new server is currently offline... This means that the machine cannot be deployed. Another node can simply be used via the list of nodes.

bpg commented 1 month ago

That should work to prevent VM movement or recreation by the provider if no VM attributes have changed. However, if there is an update to a VM, and the VM is not currently on the node where terraform thinks it is, the update would probably fail.

bitchecker commented 1 month ago

I'll try to update my module and do some test also if I don't have reachable clusters at the moment.

I'll update this thread as soon as I can. :)

bitchecker commented 1 month ago

On single node, the workaround works well, also for modifications (scale-up/scale-down of resources).

As I already said...at the moment, I don't have a reachable cluster to test on it.

elias314 commented 1 month ago

That should work to prevent VM movement or recreation by the provider if no VM attributes have changed. However, if there is an update to a VM, and the VM is not currently on the node where terraform thinks it is, the update would probably fail.

I did some simple tests with this, and it seems to work fine. I deployed a VM to one node, then moved it via the Proxmox console to another node, and then changed the amount of memory assigned to that VM and the provider updated it just fine. So the testing wasn't exactly thorough, but seemed to at least pass the smell test.

elias314 commented 1 month ago

@Sombeyyy

However, it would also be good to define for new servers that it does not matter on which node they are deployed. This way, it is not necessary to explicitly state which node should be used. An example: The specified node for the new server is currently offline...

I totally get this and agree. But I'm not sure this functionality living in the provider is the right place for it. Philosophically, and the way Terraform and all the providers I've used seem to function, is that they assume that whatever resources are specified in the TF config are always available. This feels like the right approach to me since the expectations of what the provider should do when encountering this situation will vary wildly depending on one's particular use-case. For example, what should the AWS provider do when TF tries to enforce some state on an S3 bucket but that bucket was deleted through some other means? And if you add any additional nodes to your HA group, you'd still have to modify the list of nodes in your TF variable so they get included in rotation for future deployments.

As a suggestion, maybe you can use a combination of the data sources in the provider to dynamically create the list of nodes you want to deploy to? There's a datasource for HA groups (proxmox_virtual_environment_hagroups) which returns the list of groups. HA group (proxmox_virtual_environment_hagroup) can return the list of nodes in a group. And then nodes (proxmox_virtual_environment_nodes) can return the list of nodes and their online status. Perhaps with a little TF list manipulation you can break that down into a dynamic list of deployable nodes at runtime?

For me personally, I think this is really a Proxmox feature request. What I'd like to do is to deploy a VM to an HA group instead of a specific node, and have the node scheduling and node balancing rules live inside Proxmox itself. I think it would be really powerful to have Proxmox build in some of the scheduling properties from K8s but for VMs.

bpg commented 1 month ago

Thanks @elias314 for testing and further analysis.

It actually makes sense that the lifecycle workaround works for most cases, even for the VM update. The PVE UI allows you to manage any resource from any node in a cluster, so it must be supported by API as well. The few thing that could potentially break are provider's "hacks" that use SSH under the hood, but this is only applicable to VM creation, which is out of scope here.

For me personally, I think this is really a Proxmox feature request. What I'd like to do is to deploy a VM to an HA group instead of a specific node, and have the node scheduling and node balancing rules live inside Proxmox itself. I think it would be really powerful to have Proxmox build in some of the scheduling properties from K8s but for VMs.

That could be one solution, but from the note above, its probably not even necessary. I think a way out of this mess would be simply treating node_name as a computed attribute.

PatrikKoptik commented 1 month ago

Hi Everyone,

I stumbled across this issue some time ago when it was first created. I like your ideas that it’s a feature as it is, but I have to disagree. In Terraform, we are talking about a stateful state. If we have a single node where a VM can be and this node is offline, the VM cannot be issued to another host; otherwise, the state would not be fulfilled. However, if we have a list of nodes where the VM can go and it’s on a node in this list, the state would be fulfilled.

Your S3 analogy is not quite right here in my opinion. In S3, if the bucket was deleted (similar to taking down a node), either another node from the list will be used (another S3 bucket) or the Terraform would fail. Both of these outcomes are acceptable to me. However, I am not fine with having only one node where it should be deployed but finding it on another and the state is fine with it.

Best wishes, Patrik

bitchecker commented 1 month ago

That could be one solution, but from the note above, its probably not even necessary. I think a way out of this mess would be simply treating node_name as a computed attribute.

I already do this with data "proxmox_virtual_environment_nodes" "nodes" {} in order to have all the cluster nodes or, if you have only one, to get the node name dynamically.

Of course, if you want only certain nodes, the @elias314's suggestion is more precise.

bpg commented 1 month ago

That could be one solution, but from the note above, its probably not even necessary. I think a way out of this mess would be simply treating node_name as a computed attribute.

I already do this with data "proxmox_virtual_environment_nodes" "nodes" {} in order to have all the cluster nodes or, if you have only one, to get the node name dynamically.

Of course, if you want only certain nodes, the @elias314's suggestion is more precise.

I'm talking more about the drift detection use case, when VM is moved after creation. If one wants to select a target node from a dynamic list of available nodes to deploy a new VM to, then using datasources to get this information from PVE is the way to go. The VM resource shouldn't be responsible for this decision.

elias314 commented 1 month ago

In Terraform, we are talking about a stateful state. If we have a single node where a VM can be and this node is offline, the VM cannot be issued to another host; otherwise, the state would not be fulfilled. However, if we have a list of nodes where the VM can go and it’s on a node in this list, the state would be fulfilled.

You're right, Terraform is trying to enforce a stateful state. And this is why this feels uncomfortable to me; implementing a feature like this means there are now multiple possible acceptable states. But those possible states are dictated by the provider and not explicitly codified in the TF files. While this particular feature request seems pretty straight-forward, you can imagine other similar feature requests where the outcome of multiple possible acceptable states may not be desirable for someone's particular use-case. This seems like a slippery slope to me.

bpg commented 1 month ago

While I understand the intention to use the provider as a frontend and load balancer for the Proxmox cluster, which is essentially what this ticket suggests, this is not functionality that the provider should have. As mentioned above, Terraform is not well-suited for managing a dynamic state that changes outside of its control.

My main guideline for making design decisions for the provider is to follow the Proxmox API as closely as possible. If a requested feature is not available via the API or UI, it is unlikely to be implemented.

From the ticket:

... if HA is activated for a VM and it is located on a different node than specified, it is treated as it exists already and will not be migrated back or recreated.

This could be resolved by handling node_name as a dynamic attribute that can be updated remotely.

If a VM does not yet exist, it is created by a random mechanism on one of the specified nodes.

This describes load-balancing functionality, but Proxmox does not allow specifying an existing HA group for VM creation, only individual nodes. So, unfortunately, the answer is no.

Sombeyyy commented 4 weeks ago

@bpg I completely understand your decisions. And if this feature is not added, I accept that - after all, it's not my decision. However, there are some points that I would like to explain from my point of view, as I believe they have been misunderstood.

This describes load-balancing functionality

In this concept, I would understand load-balancing to mean that the provider checks which of the nodes have the lowest load and creates the VM on this node. In my request, a list of nodes that are in a HA group is passed as possible nodes. The provider can then decide at random which node to use if the VM does not yet exist. The terraform HA-resource of the provider can also be specified directly instead of a list of nodes.

Terraform is not well-suited for managing a dynamic state that changes outside of its control

I abolosutely agree with that. A change of node can of course be completely ignored in this case via the ignore_changes block if it was migrated via HA. In my opinion, however, changing the node is something that shouln't be a problem. When checking whether the VM is on the correct node, you can simply check whether the current node is in the specified list of possible nodes.

@elias314 I would also like to say something about this point from my point of view.

And this is why this feels uncomfortable to me; implementing a feature like this means there are now multiple possible acceptable states

For me, including a list of possible nodes does not represent a state that has several acceptable statuses, but only one. The only acceptable state is when the VM is running on one of the nodes entered in the HA group. Otherwise the state is not correct.

[...] you can imagine other similar feature requests where the outcome of multiple possible acceptable states may not be desirable for someone's particular use-case

To prevent exactly this problem, the suggestion would be to set the parameter node, which expects only one node and thus only considers this one node as acceptable. For HA, i.e. a list of nodes, nodes can the be used. It would also be possible to specify the proxmox_virtual_environment_hagroup directly as ha-group and thus define an HA group as possible nodes at the provider.

I am aware that something like this should rather be realized directly by Proxmox, but it is rather unlikely that this will be implemented, based on this statement by an employee:

[...] Also we do not want to clutter the interface to much, here it seems that it could be useful but just for a smallish percentage of user. [...] https://bugzilla.proxmox.com/show_bug.cgi?id=1083

So I would ask you @bpg to think about this again, as I believe that this would be an eclusion criterion for many people who want to use HA properly. Thank you and I look forward to answer. Bastian

elias314 commented 3 weeks ago

Again, I too wish this feature existed -- in Proxmox. Its too bad that they don't seem interested in this feature as it seems obvious to me that the next step in larger infrastructures would be to deploy to an HA Group instead of a specific node.

As a workaround, give this a try:

variable "hagroup" {
  default = "SilverTeam"
}

data "proxmox_virtual_environment_hagroup" "hagroup" {
  group = var.hagroup
}

data "proxmox_virtual_environment_nodes" "available_nodes" {}

locals {
  available_nodes = data.proxmox_virtual_environment_nodes.available_nodes.names
  ha_nodes        = keys(data.proxmox_virtual_environment_hagroup.hagroup.nodes)
  online_nodes = [
    for name in local.ha_nodes : name
    if zipmap(data.proxmox_virtual_environment_nodes.available_nodes.names, data.proxmox_virtual_environment_nodes.available_nodes.online)[name] == true
  ]
}

output "available_nodes" {
  value = local.available_nodes
}

output "ha_nodes" {
  value = local.ha_nodes
}

output "online" {
  value = local.online_nodes
}

I've only done some limited testing, but this should give you a dynamic list of online nodes at runtime that you can deploy to. Couple this with ignore_changes = [ node_name ] and this should get you pretty close to what you're looking for.

bpg / terraform-provider-proxmox

[Feature Request]: VMs in a HA-Cluster #1534