coreos / fleet

fleet ties together systemd and etcd into a distributed init system
Apache License 2.0
2.42k stars 303 forks source link

After reboots, timers sometimes broken due to missing service files #1697

Open scole-scea opened 7 years ago

scole-scea commented 7 years ago

I'm running CoreOS Stable, 1122.3.0 on Google Compute Engine. (Thus: fleet 0.11.7.)

Sometimes, after a reboot, fleet-controlled timers try to start before their associated fleet-controlled associated services have been loaded, resulting in timer failures. I'd expect that the fleet launcher would wait until all the parts of a timer are loaded before starting. (Or maybe just load everything on a rebooted node before starting anything.)

A stripped log shows the sequence:

-- Reboot --
systemd[1]: Started fleet daemon.
fleetd[1221]: INFO fleetd.go:64: Starting fleetd version 0.11.7
fleetd[1221]: INFO manager.go:246: Writing systemd unit cd-pipeline-run.timer (118b)
fleetd[1221]: INFO manager.go:182: Instructing systemd to reload units
systemd[1]: cd-pipeline-run.timer: Refusing to start, unit to trigger not loaded.
systemd[1]: Failed to start Run the Classifier Data Pipeline.
fleetd[1221]: INFO manager.go:127: Triggered systemd unit cd-pipeline-run.timer start: job=1432
fleetd[1221]: INFO reconcile.go:330: AgentReconciler completed task: type=LoadUnit job=cd-pipeline-run.timer reason="unit scheduled here but not loaded"
fleetd[1221]: INFO reconcile.go:330: AgentReconciler completed task: type=ReloadUnitFiles job=N/A reason="always reload unit files"
fleetd[1221]: INFO reconcile.go:330: AgentReconciler completed task: type=StartUnit job=cd-pipeline-run.timer reason="unit currently loaded but desired state is launched"
fleetd[1221]: INFO manager.go:246: Writing systemd unit cd-pipeline-run.service (2267b)
fleetd[1221]: INFO manager.go:182: Instructing systemd to reload units
fleetd[1221]: INFO reconcile.go:330: AgentReconciler completed task: type=LoadUnit job=cd-pipeline-run.service reason="unit scheduled here but not loaded"

(The more complete log is in a Gist, here.)

I've yet to find the fleet option that tells it that a pair (or more) of unit files need to be handled together...

scole-scea commented 7 years ago

This might be #1621.