Closed bernardhalas closed 7 months ago
As agreed, during today's grooming, passing directly to Jakub.
Ensure all Claudie images forward SIGTERM do processes
All Claudie images forward SIGTERM to main the process of the container, but not to the child processes of this process.
Services respect SIGTERM (if not, a new Issue can be opened to services which don't respect SIGTERM)
I would say that services respect SIGTERM partially. Since the workflow heavy lifting is performed by the child processes of the Claudie services, the Claudie services have to wait until the child processes finish (No SIGTERM is forwarded to the child processes), and then the Claudie processes terminate.
terraformer, upon receiving SIGTERM, should issue exactly one SIGTERM to the terraform process (because multiple SIGTERM or SIGINT signals to terraform process represent a request for a non-graceful shutdown;
SIGTERM isn't forwarded to the terraform
process, because it is a child process of the terraformer
main container process.
Also, if you SIGTERM terraform
process it is shutdown gracefully, but the terraformer
re-run the terraform
process as part of the retry loop. In my case it broke the TF state (error regarding already existing resource) and the workflow failed.
I'm closing this issue in favor of https://github.com/berops/claudie/issues/1298
We had a discussion about the current way of handling the SIGTERM calls. To summarize, Claudie's services listen to SIGTERM calls. The SIGTERM is forwarded to the main process of the container, then the call is handled programmatically by us. Besides that, we don't want to forward this call to the child processes of the main container process.
EDIT: @bernardhalas please complement me if there is something I missed or worth mentioning.
Description
By default, SIGTERM is an important signal for the processes which represents a request for graceful shutdown. It's used by Kubernetes whenever there's a pod termination requested (e.g. pod eviction, pod restarts upon liveness probe failure, etc.).
We need to ensure that Claudie services react properly to SIGTERM signal on all levels. This means:
terraformer
, upon receiving SIGTERM, should issue exactly one SIGTERM to theterraform
process (because multiple SIGTERM or SIGINT signals toterraform
process represent a request for a non-graceful shutdown; ref <<-- I couldn't find the src code, but at least referring to the issues that confirm this behavior)Exit criteria
terraform
process receives just one SIGTERM upon one SIGTERM sent toterraformer
pod