Open nh2 opened 5 years ago
In other words, nixops deploy
is not congruent in the terminology of https://blog.flyingcircus.io/2016/05/06/thoughts-on-systems-management-methods/.
Here's some more info on how things work right now:
getActiveUnits
uses systemctl list-units --full --no-legend
.
After a successful systemctl activate multi-user.target
, that looks like this:
...
systemd-udevd-kernel.socket loaded active running udev Kernel Socket
basic.target loaded active active Basic System
encrypted-links.target loaded active active All Encrypted Links
getty.target loaded active active Login Prompts
local-fs-pre.target loaded active active Local File Systems (Pre)
local-fs.target loaded active active Local File Systems
multi-user.target loaded active active Multi-User System
network-interfaces.target loaded active active All Network Interfaces (deprecated)
network-online.target loaded active active Network is Online
network-pre.target loaded active active Network (Pre)
network.target loaded active active Network
nss-lookup.target loaded active active Host and Network Name Lookups
...
and we can see:
# systemctl status multi-user.target
● multi-user.target - Multi-User System
Loaded: loaded (/nix/store/3hmpbbcv1db42m9g34c9g4q6qinw50x4-systemd-237/example/systemd/system/multi-user.target; linked; vendor preset: enabled)
Active: active since Thu 2019-01-10 21:00:22 UTC; 1min 15s ago
Docs: man:systemd.special(7)
But as soon as one service having RequiredBy = [ "multi-user.target" ]
stops, the multi-user.target
is no longer active. For example, if I systemctl stop myservice
then it looks like this:
# systemctl status multi-user.target
● multi-user.target - Multi-User System
Loaded: loaded (/nix/store/3hmpbbcv1db42m9g34c9g4q6qinw50x4-systemd-237/example/systemd/system/multi-user.target; linked; vendor preset: enabled)
Active: inactive (dead) since Thu 2019-01-10 21:02:27 UTC; 2s ago
Docs: man:systemd.special(7)
That also makes it disappear from the systemctl list-units --full --no-legend
list. Consequently, the bit in https://github.com/NixOS/nixpkgs/blob/542ef2b182dff9756abf782a650f80599c515e4a/nixos/modules/system/activation/switch-to-configuration.pl#L174-L177
is not executed and multi-user.target
is not started by switch-to-configuration
(which nixops calls).
So, if any service stopped (thus stopping multi-user.target
), then nixops deploy
will currently not start any declared units, and thus it's not congruent.
A remaining question is whether it should be nixops or switch-to-configuration
that is to be made congruent.
PR at #1078
@nh2 until your work gets merged - do you think there's a better workaround than running systemctl isolate ...
"by hand" after every deployment?
With regards to your question: would it hurt to have it in both places at some point in time with nixops leading the way?
I have the same issue with nixops -- units not reaching nominal state post-deploy, despite being enabled in the Nixops network definition:
Loaded: loaded (/nix/store/00g5g2ws4032brlm4fwb7lakmdcgyi0z-unit-foo.service/foo.service; enabled; vendor preset: enabled)
Active: inactive (dead)
I stumbled on this issue too. I have to deploy like this to re-trigger the multi-user.target
nixops deploy -d <deployment> && sleep 5 && nixops ssh-for-each -d <deployment> "systemctl isolate multi-user.target"
If my
nginx
shuts down because of some failure in another systemd unit that it depends on, and I fix the issue and deploy with nixops, my nginx isn't actually started.In general, when I run
nixops deploy
, the desired state isn't reached if a unit stopped for some reason.It's only reached with
--force-reboot
ornixops reboot
.I think this is because
nixops deploy
doesn't actuallysystemctl isolate
any target likemulti-user.target
(which is the default target that we putrequiredBy
on in NixOS service modules if we want it to be started).I propose that we should probably
systemctl isolate multi-user.target
duringnixops deploy
.Any opposing views?