Closed mattburgess closed 4 months ago
[Before rollout]
mcm-immutable-node-az-b-integrated-test-6b8bc-kprnn
[updated mcd's reference to AWSMachineClass with some changes again]
[MCM updated new mcs' reference to AWSMachineClass]
immutable-machine-controller-manager-8677546c59-b74qz machine-controller-manager I0627 13:26:23.907365 1 deployment_util.go:522] Observed a change in classKind of Machine Deployment mcm-immutable-node-az-b-integrated-test. Changing classKind from MachineClass to AWSMachineClass.
[MC migration logic migrates from AWSMachineClass to MachineClass]
immutable-machine-controller-manager-8677546c59-b74qz machine-controller I0627 13:26:26.422698 1 migrate_machineclass.go:155] Create/Apply successful for MachineClass mcm-immutable-node-az-b-integrated-test
immutable-machine-controller-manager-8677546c59-b74qz machine-controller I0627 13:26:26.428448 1 migrate_machineclass.go:177] Updated class reference for machine machine-controller-manager-int/mcm-immutable-node-az-b-integrated-test-5bc67-vj4br
immutable-machine-controller-manager-8677546c59-b74qz machine-controller I0627 13:26:26.461724 1 migrate_machineclass.go:196] Updated class reference for machineset machine-controller-manager-int/mcm-immutable-node-az-b-integrated-test-5bc67
immutable-machine-controller-manager-8677546c59-b74qz machine-controller I0627 13:26:26.493563 1 migrate_machineclass.go:384] Set migrated annotation for ProviderSpecificMachineClass AWSMachineClass/mcm-immutable-node-az-b-integrated-test
[MC creates VM for new mc vj4br
]
immutable-machine-controller-manager-8677546c59-b74qz machine-controller I0627 13:26:27.905844 1 machine.go:367] Created new VM for machine: "mcm-immutable-node-az-b-integrated-test-5bc67-vj4br" with ProviderID: "aws:///eu-west-1/i-0c50d92b176e99c0d" and backing node: ""
immutable-machine-controller-manager-8677546c59-b74qz machine-controller I0627 13:26:27.912682 1 machine.go:492] Machine labels/annotations UPDATE for "mcm-immutable-node-az-b-integrated-test-5bc67-vj4br"
immutable-machine-controller-manager-8677546c59-b74qz machine-controller I0627 13:26:27.976595 1 machine.go:518] Machine/status UPDATE for "mcm-immutable-node-az-b-integrated-test-5bc67-vj4br" during creation
[MCM creates VM for new mc vj4br
]
immutable-machine-controller-manager-8677546c59-b74qz machine-controller-manager I0627 13:26:28.327003 1 machine.go:487] Created/Adopted machine: "mcm-immutable-node-az-b-integrated-test-5bc67-vj4br", MachineID: aws:///eu-west-1/i-0ee191ba68be62801
[MC safety controller deletes VM created by MCM]
immutable-machine-controller-manager-8677546c59-b74qz machine-controller I0627 13:26:28.768134 1 machine_safety.go:300] SafetyController: Orphan VM found and terminated VM: mcm-immutable-node-az-b-integrated-test-5bc67-vj4br, aws:///eu-west-1/i-0ee191ba68be62801
[MCM safety controller deletes VM created by MC]
immutable-machine-controller-manager-8677546c59-b74qz machine-controller I0627 13:26:29.229228 1 core.go:316] VM "aws:///eu-west-1/i-0c50d92b176e99c0d" for Machine "mcm-immutable-node-az-b-integrated-test-5bc67-vj4br" was terminated succesfully
Ideally MCM should not reconcile the machine obj, because in ideal case mc is updated with generic machineclass and so due to this code no operation will be done by MCM , but from the code of MCM v0.48.3, it seems like orphan collection could still be done by MCM. To deal with that you could use MCM v0.49.0 where orphan collection code has been removed.
Here it seems the following happened: (some assumptions due to missing logs)
13:26:27.905844
and updated the providerID on the machine obj with aws:///eu-west-1/i-0c50d92b176e99c0d
13:26:28.327003
but providerID not yet updated on machine objaws:///eu-west-1/i-0ee191ba68be62801
as its stated when you listed the machine (but logs aren't available)
mcm-immutable-node-az-b-integrated-test-5bc67-vj4br Pending 3m42s ip-10-50-196-240.eu-west-1.compute.internal aws:///eu-west-1/i-0ee191ba68be62801
MCM has started acting here because the migration logic was not completed by the time MCM had picked up the machine for reconcile (unfortunately the logs aren't available mostly because they are higher V level logs )
I don't understand the reasoning for not migrating by your above statements @mattburgess :
We're trying to carry out a migration from AWSMachineClasses to MachineClasses but need to be in a position where we have a single deployment of MCM that can handle deployments of AWSMachineClasses until we can update the deployments to roll out MachineClasses instead.
could you explain more. Because we don't support migration fixes now.
close due to inactivity
What happened:
Having upgraded to mcm-0.48.3 and mcm-provider-aws-0.17.0 we're seeing odd behaviour during a rolling update of a machinedeployment. Specifically:
What you expected to happen:
All VMs created during a rolling update reach Running. No VMs created during a rolling update are classed as Orphans.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know:
We're trying to carry out a migration from AWSMachineClasses to MachineClasses but need to be in a position where we have a single deployment of MCM that can handle deployments of AWSMachineClasses until we can update the deployments to roll out MachineClasses instead.
Our pod config looks like this (taken from deployment.yaml based on a complete lack of documentation on either the MCM or mcm-provider-aws side of how to carry out this migration):
And finally, here's a breakdown of the logs that we saw:
After the initial MachineDeployment scale up we see 2 healthy running nodes:
After the rollout of the updated AWSMachineClass/MachineDeployment we see the following:
Note that AZ A rolled out fine, but AZ B's node is still pending:
The start of the logs for the AZ B rollout all look good; MCM's seen the incoming
AWSMachineClass
and span up a new machine:Next we see MCM needing to migrate from AWSMachineClass to MachineClass:
The following also looks good; it's now determined that we have too many nodes in AZ B because we have the new
vj4br
and oldkprnn
node; it chooses to delete the latter:Next we see MCM make the API call to get a new AWS instance created for the
vj4br
machine at the same time as the Orphan VM reconciler kicks in:The API call resulted in
aws:///eu-west-1/i-0c50d92b176e99c0d
being provisioned, but straight afterwards we see MCM adoptingaws:///eu-west-1/i-0ee191ba68be62801
then immediately seeing that it's orphaned so terminates it!Next we see MCM (manager, not the provider) decide that the original instance that was provisioned,
aws:///eu-west-1/i-0c50d92b176e99c0d
, is also Orphaned so it deletes it as well:Environment:
k8s: 1.23 mcm: 0.48.3 mcm-provider-aws: 0.17.0