NixOS / nixops

NixOps is a tool for deploying to NixOS machines in a network or cloud.
https://nixos.org/nixops
GNU Lesser General Public License v3.0
1.78k stars 363 forks source link

nixops deploy doesn't bring server in desired state because it doesn't start stopped systemd units #1063

Open nh2 opened 5 years ago

nh2 commented 5 years ago

If my nginx shuts down because of some failure in another systemd unit that it depends on, and I fix the issue and deploy with nixops, my nginx isn't actually started.

In general, when I run nixops deploy, the desired state isn't reached if a unit stopped for some reason.

It's only reached with --force-reboot or nixops reboot.

I think this is because nixops deploy doesn't actually systemctl isolate any target like multi-user.target (which is the default target that we put requiredBy on in NixOS service modules if we want it to be started).

I propose that we should probably systemctl isolate multi-user.target during nixops deploy.

Any opposing views?

nh2 commented 5 years ago

In other words, nixops deploy is not congruent in the terminology of https://blog.flyingcircus.io/2016/05/06/thoughts-on-systems-management-methods/.

nh2 commented 5 years ago

Here's some more info on how things work right now:

https://github.com/NixOS/nixpkgs/blob/542ef2b182dff9756abf782a650f80599c515e4a/nixos/modules/system/activation/switch-to-configuration.pl#L69

getActiveUnits uses systemctl list-units --full --no-legend.

After a successful systemctl activate multi-user.target, that looks like this:

...
systemd-udevd-kernel.socket                                                              loaded active running   udev Kernel Socket                                                
basic.target                                                                             loaded active active    Basic System                                                      
encrypted-links.target                                                                   loaded active active    All Encrypted Links                                               
getty.target                                                                             loaded active active    Login Prompts                                                     
local-fs-pre.target                                                                      loaded active active    Local File Systems (Pre)                                          
local-fs.target                                                                          loaded active active    Local File Systems                                                
multi-user.target                                                                        loaded active active    Multi-User System                                                 
network-interfaces.target                                                                loaded active active    All Network Interfaces (deprecated)                               
network-online.target                                                                    loaded active active    Network is Online                                                 
network-pre.target                                                                       loaded active active    Network (Pre)                                                     
network.target                                                                           loaded active active    Network                                                           
nss-lookup.target                                                                        loaded active active    Host and Network Name Lookups                                     
...

and we can see:

# systemctl status multi-user.target
● multi-user.target - Multi-User System
   Loaded: loaded (/nix/store/3hmpbbcv1db42m9g34c9g4q6qinw50x4-systemd-237/example/systemd/system/multi-user.target; linked; vendor preset: enabled)
   Active: active since Thu 2019-01-10 21:00:22 UTC; 1min 15s ago
     Docs: man:systemd.special(7)

But as soon as one service having RequiredBy = [ "multi-user.target" ] stops, the multi-user.target is no longer active. For example, if I systemctl stop myservice then it looks like this:

# systemctl status multi-user.target
● multi-user.target - Multi-User System
   Loaded: loaded (/nix/store/3hmpbbcv1db42m9g34c9g4q6qinw50x4-systemd-237/example/systemd/system/multi-user.target; linked; vendor preset: enabled)
   Active: inactive (dead) since Thu 2019-01-10 21:02:27 UTC; 2s ago
     Docs: man:systemd.special(7)

That also makes it disappear from the systemctl list-units --full --no-legend list. Consequently, the bit in https://github.com/NixOS/nixpkgs/blob/542ef2b182dff9756abf782a650f80599c515e4a/nixos/modules/system/activation/switch-to-configuration.pl#L174-L177

is not executed and multi-user.target is not started by switch-to-configuration (which nixops calls).

So, if any service stopped (thus stopping multi-user.target), then nixops deploy will currently not start any declared units, and thus it's not congruent.

nh2 commented 5 years ago

A remaining question is whether it should be nixops or switch-to-configuration that is to be made congruent.

nh2 commented 5 years ago

PR at #1078

allgreed commented 4 years ago

@nh2 until your work gets merged - do you think there's a better workaround than running systemctl isolate ... "by hand" after every deployment?

With regards to your question: would it hurt to have it in both places at some point in time with nixops leading the way?

deepfire commented 4 years ago

I have the same issue with nixops -- units not reaching nominal state post-deploy, despite being enabled in the Nixops network definition:

  Loaded: loaded (/nix/store/00g5g2ws4032brlm4fwb7lakmdcgyi0z-unit-foo.service/foo.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
JosephLucas commented 4 years ago

I stumbled on this issue too. I have to deploy like this to re-trigger the multi-user.target

nixops deploy -d <deployment> && sleep 5 && nixops ssh-for-each -d <deployment> "systemctl isolate multi-user.target"