Open kltm opened 1 year ago
I don't believe there is likely anything we can easily do about this for the moment, except lean into the restarts and try to keep the machines at lower use when we need to get things through.
We've lost two recent snapshot
builds to this again.
Feels like it's happening more often these days. I haven't crunched the numbers, but am recording still in my notes.
Recently, we've been running into a lot of failure where, when shutting down a docker container, the pipeline terminates with an error like:
I suspect that while we are in high memory or high usage scenarios, the gap between the SIGTERM and SIGKILL signals is not enough (https://docs.docker.com/engine/reference/commandline/stop/). This seems to be hardwired to one second pretty deep in the plugin: https://github.com/jenkinsci/docker-workflow-plugin/blob/d5d2e5c4007f7ea006152542b2bcbe0f1b2b08aa/src/main/java/org/jenkinsci/plugins/docker/workflow/client/DockerClient.java#L185