cnrancher / autok3s

Run K3s Everywhere
https://www.suse.com
Apache License 2.0
741 stars 76 forks source link

fix(cluster): fix stuck in Upgrading status when join nodes failed #649

Closed JacieChao closed 8 months ago

JacieChao commented 8 months ago

Issue

https://github.com/cnrancher/autok3s/issues/648

Problem

The defer function can't catch the error correctly because the err is a new variable when calling Join node function. When the error occurs, the defer error function can't refresh the cluster status.

Solution

Change to use the consistent err variables to ensure the error can be handled by the defer function.

Test

orangedeng commented 8 months ago

Instead of changing the err definition, maybe we can define a new er in the return parameters as func (...) (er error). This could minimize the changes and avoid future incorrect modifications.

Jason-ZW commented 8 months ago

This is an optimization, at least it is more friendly to the scenario of adding nodes in batches to the native provider.

It is possible to catch all errors of failed nodes and then perform partial rollback. The corresponding relationship between errors and failed nodes can be distinguished in the log.