Closed kaufers closed 6 years ago
Per Issue #838 and PR #839 -- the leader node will be terminated as the very last step (export INFRAKIT_GROUP_POLICY_SELF_UPDATE=last
, which is also the default behavior -- see https://github.com/docker/infrakit/blob/master/pkg/run/v0/group/group.go#L70), the leader will be terminated as the very last node in the rolling update. Please verify this behavior.
This will address 1. of above. If 1. is guaranteed, the next step is to ensure we can properly terminate the vm and all of its resources in a predictable way -- since the "self" node can shut down at any time due to the vm termination and Terraform apply
could be mid-flight and potentially leaving Terraform files on disk in a corrupted state.
How we can delete the vm and its associated resources in a way that can be tolerant to terraform apply
being interrupted mid-flight due to the self node being shutdown?
Thinking through how Terraform works... I wonder if this can be done at all... If the self node is terminated as part of terraform apply
, that process will just die mid-flight. Will this leave the terraform state files on disk in a corrupted state? If we know that terraform at least guarantees file / state consistency at the per-resource granularity, then we could do something with creating tombstones of the resources we need to delete:
delete-<timestamp>
.delete-current
to point to this new directory).delete-current
directory.terraform apply
. Terraform will start deleting resources and update its state file as it proceeds (or maybe wait for everything to be deleted then 'commits').terraform apply
is terminated. Everything goes out.delete-current
directory point to no files... If any symlink resolves (os.Readlink()
), it should remove the linked file.terraform apply
again. The big assumption here is that any files that Terraform writes (its own state files -- not the ones we create/delete) do not get corrupted mid-flight. This is a pretty big assumption. Is there a way you can verify @kaufers ?
If we don't want to make this assumption or don't trust what is said on the tin, then we would have to do something more coordinated. See my comments on #838
@chungers I think that what you have for #838 and #839 might actually solve this issue. Today, with the "resource" counting, we remove the "globally" scoped resource files when the last VM that is references them is destroyed. In this case, that means that the terraform apply
will include the destroy
call for all of the resources (including the self
VM).
In my testing on IBM Cloud, the resource destroy API call returns pretty quickly and there is a delay (up to a few minutes) before the actual VM is powered down. This provides plenty of time for all of the resources to be destroyed.
We hit issues when the manager group destroy deletes the current leader first. Once the updates are merged to ensure destroy ordering I'll provide an update to this issue (there may no longer be problems).
The terraform plugin supports defined related resources (for example, a NFS volume for a group of instances and a block storage volume for a single instance). When the group is removed, we want to ensure that all of these related resources are also cleaned up.
We hit problems when the current leader is destroyed first since the VM running terraform is stopped before terraform can finish removing everything.
Seems like we need to do 2 things:
Destroy
, ensure that the current leader is destroyed lastterraform apply
cycle as the VM.Note that 2 is tricky, we cannot simply delete everything except the VM since the VM will not function correctly if the backing storage is removed.