NixOS / nixops

NixOps is a tool for deploying to NixOS machines in a network or cloud.
https://nixos.org/nixops
GNU Lesser General Public License v3.0
1.82k stars 363 forks source link

containers: Deploying after destroying makes nixops hang #809

Open nh2 opened 6 years ago

nh2 commented 6 years ago

When I use targetEnv = "container";, and nixops destroy, and then immediately nixops deploy again, nixops hangs at this output:

...
machine1.> creating container...
machine3..> creating container...
machine1.> host IP is 10.233.130.1, container IP is 10.233.130.2
machine3..> host IP is 10.233.131.1, container IP is 10.233.131.2
machine2> IP address is 10.233.122.2
machine2> setting state version to 17.09
^C
Traceback (most recent call last):
  File "./ops", line 130, in <module>
    env=env,
  File "/nix/store/k0c5spdm7g4lb9gkm3l20v81dbl93s0h-python3-3.6.3/lib/python3.6/subprocess.py", line 269, in call
    return p.wait(timeout=timeout)
  File "/nix/store/k0c5spdm7g4lb9gkm3l20v81dbl93s0h-python3-3.6.3/lib/python3.6/subprocess.py", line 1457, in wait
    (pid, sts) = self._try_wait(0)
  File "/nix/store/k0c5spdm7g4lb9gkm3l20v81dbl93s0h-python3-3.6.3/lib/python3.6/subprocess.py", line 1404, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)
KeyboardInterrupt

The ^C is where I killed it with Ctrl+C because it hung forever.

The reason for this is that nixos-container destroy, which nixops calls, is asynchronous and exits immediately; thus for the next deploy (and if the container didn't shut down immediately), nixops and nixos-container think they can re-use existing container names that are actually still in use (they show up in machinectl).

See https://github.com/NixOS/nixpkgs/issues/32545

nh2 commented 6 years ago

PR in #810

nh2 commented 6 years ago

Also related: https://github.com/NixOS/nixpkgs/issues/32551 that makes container shutdowns take the full 90 seconds, thus making this issue here very visible.