coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
146 stars 30 forks source link

locksmithd will reboot outside of reboot windows, if no semaphore was acquired before #1886

Open pizzarabe opened 7 years ago

pizzarabe commented 7 years ago

Issue Report

Bug

CL will reboot outside of the reboot windows if the update-engine downloaded a newer version and locksmithd was unable to acquire a semaphore in time.

Reproduction Steps

Mar 24 14:08:00 alien1 locksmithd[2964]: Reboot window start is "22:00" and length is "6h"
Mar 24 14:08:00 alien1 locksmithd[2964]: Next window begins at 2017-03-24 22:00:00 +0100 CET and ends at 2017-03-25 04:00:00 +0100 CET
Mar 24 14:08:00 alien1 locksmithd[2964]: locksmithd starting currentOperation="UPDATE_STATUS_IDLE" strategy="etcd-lock"
...
Mar 25 23:56:47 alien1 locksmithd[2964]: LastCheckedTime=1490482573 Progress=0 CurrentOperation="UPDATE_STATUS_UPDATED_NEED_REBOOT" NewVersion=1298.6.0 NewSize=269047025
Mar 25 23:56:47 alien1 locksmithd[2964]: Failed to acquire lock: semaphore is at 0. Retrying in 10s.
...
Mar 27 09:36:58 alien1 locksmithd[2964]: Failed to acquire lock: semaphore is at 0. Retrying in 5m0s.

unlocking the semaphore with locksmithctl unlock

Mar 27 09:41:58 alien1 locksmithd[2964]: Reboot sent. Going to sleep.
Mar 27 09:41:58 alien1 systemd[1]: Stopping Cluster reboot manager...
Mar 27 09:41:58 alien1 locksmithd[2964]: Received interrupt/termination signal - locksmithd is exiting.
Mar 27 09:42:00 alien1 systemd[1]: Stopped Cluster reboot manager.
-- Reboot --

Container Linux Version

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1298.6.0 (and 1298.5.0)
VERSION_ID=1298.6.0
BUILD_ID=2017-03-14-2119
PRETTY_NAME="Container Linux by CoreOS 1298.6.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
saj commented 6 years ago

@bgilbert May I ask why this issue was tagged with closed/rejected?

I can confirm this problem is still reproducible in locksmith 0.6.1.

bgilbert commented 6 years ago

@saj We're triaging issues that are unlikely to be fixed in Container Linux, now that the focus of new development is shifting to Red Hat CoreOS and its community counterpart. No final determination has yet been made for this particular bug, which is why it's still open.