Open wl-chen opened 1 year ago
@wl-chen Would you share some detailed information on the issue? Thanks.
@wl-chen Would you share some detailed information on the issue? Thanks.
Sorry, I slipped and didn't finish writing and sent it.
This event is sporadic, but when viewing the source code, the guess may be related to this part of the code https://github.com/clusternet/clusternet/blob/main/pkg/utils/deployer.go#L193. When one of helmreleases that rendered by description install or upgrade failed, both description-controller and helmrelrease-controller will re-execute the handle helmrelease logic. The code just mentioned will mistakenly delete the chart being executed.
Yeah, the helm releases will be re-enqueued and handled by handleHelmRelease
in the next round.
https://github.com/clusternet/clusternet/blob/main/pkg/utils/deployer.go#L193 will only uninstall failed releases when atomic
is explicitly set to true
. I don't know whether setting atomic
to false (which is false by default) will help mitigate this problem.
When installing helmrelease by clusternet-hub, the above errors will cause helmrelease to generate the following events: an error occurred while uninstalling the release. original install error: services "service-xxx" not found: uninstall: Release not loaded: chart-xxx: release: not found
I wonder whether ignoring such not found helm release
in function UninstallRelease
will help solve the problem.
@wl-chen Please feel free to submit a PR if you've got a solution. Thanks.
cc @DanielXLee for this part of uninstalling
logics.
This event is sporadic, but when viewing the source code, the guess may be related to this part of the code https://github.com/clusternet/clusternet/blob/main/pkg/utils/deployer.go#L193. When one of helmreleases that rendered by description install or upgrade failed, both description-controller and helmrelrease-controller will re-execute the handle helmrelease logic. The code just mentioned will mistakenly delete the chart being executed.
Yeah, the helm releases will be re-enqueued and handled by
handleHelmRelease
in the next round.https://github.com/clusternet/clusternet/blob/main/pkg/utils/deployer.go#L193 will only uninstall failed releases when
atomic
is explicitly set totrue
. I don't know whether settingatomic
to false (which is false by default) will help mitigate this problem.When installing helmrelease by clusternet-hub, the above errors will cause helmrelease to generate the following events: an error occurred while uninstalling the release. original install error: services "service-xxx" not found: uninstall: Release not loaded: chart-xxx: release: not found
I wonder whether ignoring such
not found helm release
in functionUninstallRelease
will help solve the problem.@wl-chen Please feel free to submit a PR if you've got a solution. Thanks.
cc @DanielXLee for this part of
uninstalling
logics.
You mean handleHelmRelease
will only be executed by helmrelease-controller in the next round ? Will the code https://github.com/clusternet/clusternet/blob/main/pkg/hub/deployer/helm/helm.go#L355 be optimized ?
You mean handleHelmRelease will only be executed by helmrelease-controller in the next round ?
handleHelmRelease
will be executed by deployer in clusternet-agent
/ clusternet-hub
. This depends on the cluster sync mode.
Will the code https://github.com/clusternet/clusternet/blob/main/pkg/hub/deployer/helm/helm.go#L355 be optimized ?
@wl-chen Feel free to make your changes if necessary. You'd better not introduce breaking changes.
@wl-chen you can try to fix it and then verify in our test env. If works well, push your fix to the community.
@wl-chen Would you please share some updates on this issue? Thanks.
What happened:
Charts may be uninstalled probabilistically when installing or upgrading.
What you expected to happen:
Charts can not be uninstalled probabilistically when installing or upgrading.
How to reproduce it (as minimally and precisely as possible):
This event is sporadic, but when viewing the source code, the guess may be related to this part of the code https://github.com/clusternet/clusternet/blob/main/pkg/utils/deployer.go#L193. When one of helmreleases that rendered by description install or upgrade failed, both description-controller and helmrelrease-controller will re-execute the handle helmrelease logic. The code just mentioned will mistakenly delete the chart being executed.
When installing helmrelease by clusternet-hub, the above errors will cause helmrelease to generate the following events:
an error occurred while uninstalling the release. original install error: services "service-xxx" not found: uninstall: Release not loaded: chart-xxx: release: not found
Anything else we need to know?:
The above is my analysis, I hope you can see if it is correct.
Environment:
clusternet-agent --version=json
): image version v0.13.1clusternet-hub --version=json
): iamge version v0.13.1kubectl version
): client: v1.18.4-tke.1; server: v1.20.6-tke.20cat /etc/os-release
):uname -a
):