NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.44k stars 13.64k forks source link

Can't start 200 nixos-containers #67970

Open davidak opened 5 years ago

davidak commented 5 years ago

Describe the bug

This is a follow-up to https://github.com/NixOS/nixpkgs/issues/65001. With my fix merged, i'm able to start 170 nixos-containers.

My end goal is to have 10000 containers, but for now, the next milestone is 200. I also explore LXC as an alternative since nixos-generators supports it now!

When starting 200 containers, 44 fail to start.

started failed
500 Kernel panic
400 165
300 55
200 44
190 20
180 7
170 0

Here is the log of one failed unit:

Click to expand
-- Logs begin at Mon 2019-09-02 21:13:37 UTC, end at Mon 2019-09-02 21:28:47 UTC. --
Sep 02 21:13:52 targets-host systemd[1]: Starting Container 'target101'...
Sep 02 21:13:58 targets-host container target101[2702]: Spawning container target101 on /var/lib/containers/target101.
Sep 02 21:13:58 targets-host container target101[2702]: Press ^] three times within 1s to kill container.
Sep 02 21:14:04 targets-host container target101[2702]: <<< NixOS Stage 2 >>>
Sep 02 21:14:19 targets-host container target101[2702]: tee: /proc/self/fd/10: No such device or address
Sep 02 21:15:41 targets-host container target101[2702]: starting systemd...
Sep 02 21:15:44 targets-host container target101[2702]: systemd 239 running in system mode. (+PAM +AUDIT -SELINUX +IMA +APPARMOR +SMACK -SYSVINIT +UTMP -LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
Sep 02 21:15:44 targets-host container target101[2702]: Detected virtualization systemd-nspawn.
Sep 02 21:15:44 targets-host container target101[2702]: Detected architecture x86-64.
Sep 02 21:15:44 targets-host container target101[2702]: [1B blob data]
Sep 02 21:15:44 targets-host container target101[2702]: Welcome to NixOS 19.03.173391.0715f2f1a9b (Koi)!
Sep 02 21:15:44 targets-host container target101[2702]: [1B blob data]
Sep 02 21:15:44 targets-host container target101[2702]: Set hostname to .
Sep 02 21:15:44 targets-host container target101[2702]: Initializing machine ID from container UUID.
Sep 02 21:15:44 targets-host container target101[2702]: Failed to install release agent, ignoring: No such file or directory
Sep 02 21:15:49 targets-host container target101[2702]: File /nix/store/679k7dlwk5iifgdynxmi3r48ii7fgifd-systemd-239.20190219/example/systemd/system/systemd-journald.service:36 configures an IP firewall (IPAddressDeny=any), but the local system does not support BPF/cgroup based firewalling.
Sep 02 21:15:49 targets-host container target101[2702]: Proceeding WITHOUT firewalling in effect! (This warning is only shown for the first loaded unit using IP firewalling.)
Sep 02 21:15:52 targets-host container target101[2702]: [  OK  ] Reached target Swap.
Sep 02 21:15:52 targets-host container target101[2702]: [  OK  ] Started Dispatch Password Requests to Console Directory Watch.
Sep 02 21:15:52 targets-host container target101[2702]: [  OK  ] Listening on Journal Socket.
Sep 02 21:15:52 targets-host container target101[2702]:          Mounting Huge Pages File System...
Sep 02 21:15:52 targets-host container target101[2702]: [  OK  ] Created slice User and Session Slice.
Sep 02 21:15:52 targets-host container target101[2702]:          Starting Apply Kernel Variables...
Sep 02 21:15:52 targets-host container target101[2702]: [  OK  ] Listening on Journal Socket (/dev/log).
Sep 02 21:15:52 targets-host container target101[2702]: [  OK  ] Listening on initctl Compatibility Named Pipe.
Sep 02 21:15:52 targets-host container target101[2702]:          Mounting POSIX Message Queue File System...
Sep 02 21:15:52 targets-host container target101[2702]: [  OK  ] Created slice system-getty.slice.
Sep 02 21:15:52 targets-host container target101[2702]: [  OK  ] Reached target All Network Interfaces (deprecated).
Sep 02 21:15:52 targets-host container target101[2702]: [  OK  ] Started Forward Password Requests to Wall Directory Watch.
Sep 02 21:15:53 targets-host container target101[2702]:          Starting Update UTMP about System Boot/Shutdown...
Sep 02 21:15:53 targets-host container target101[2702]: [  OK  ] Reached target Remote File Systems.
Sep 02 21:15:53 targets-host container target101[2702]: [  OK  ] Reached target Paths.
Sep 02 21:15:53 targets-host container target101[2702]:          Starting Journal Service...
Sep 02 21:15:53 targets-host container target101[2702]: [  OK  ] Reached target Local File Systems (Pre).
Sep 02 21:15:53 targets-host container target101[2702]: [  OK  ] Reached target Local File Systems.
Sep 02 21:15:53 targets-host container target101[2702]:          Starting Rebuild Journal Catalog...
Sep 02 21:15:53 targets-host container target101[2702]: [  OK  ] Reached target Slices.
Sep 02 21:15:54 targets-host container target101[2702]: [  OK  ] Mounted Huge Pages File System.
Sep 02 21:15:55 targets-host container target101[2702]: [  OK  ] Mounted POSIX Message Queue File System.
Sep 02 21:15:58 targets-host container target101[2702]: [  OK  ] Started Update UTMP about System Boot/Shutdown.
Sep 02 21:15:59 targets-host container target101[2702]: [  OK  ] Started Apply Kernel Variables.
Sep 02 21:15:59 targets-host container target101[2702]:          Starting Networking Setup...
Sep 02 21:15:59 targets-host container target101[2702]: [  OK  ] Started Journal Service.
Sep 02 21:15:59 targets-host container target101[2702]:          Starting Flush Journal to Persistent Storage...
Sep 02 21:15:59 targets-host container target101[2702]: [  OK  ] Started Rebuild Journal Catalog.
Sep 02 21:16:00 targets-host container target101[2702]:          Starting Update is Completed...
Sep 02 21:16:02 targets-host container target101[2702]: [  OK  ] Started Flush Journal to Persistent Storage.
Sep 02 21:16:02 targets-host container target101[2702]: [  OK  ] Started Update is Completed.
Sep 02 21:16:02 targets-host container target101[2702]: [  OK  ] Reached target System Initialization.
Sep 02 21:16:02 targets-host container target101[2702]: [  OK  ] Listening on SSH Socket.
Sep 02 21:16:02 targets-host container target101[2702]: [  OK  ] Listening on D-Bus System Message Bus Socket.
Sep 02 21:16:02 targets-host container target101[2702]: [  OK  ] Reached target Sockets.
Sep 02 21:16:02 targets-host container target101[2702]: [  OK  ] Reached target Basic System.
Sep 02 21:16:02 targets-host container target101[2702]:          Starting Name Service Cache Daemon...
Sep 02 21:16:02 targets-host container target101[2702]:          Starting DHCP Client...
Sep 02 21:16:02 targets-host container target101[2702]: [  OK  ] Started Daily Cleanup of Temporary Directories.
Sep 02 21:16:02 targets-host container target101[2702]: [  OK  ] Reached target Timers.
Sep 02 21:16:03 targets-host container target101[2702]:          Starting Create Volatile Files and Directories...
Sep 02 21:16:06 targets-host container target101[2702]: [  OK  ] Started Create Volatile Files and Directories.
Sep 02 21:16:17 targets-host container target101[2702]: [856B blob data]
Sep 02 21:16:24 targets-host container target101[2702]: [523B blob data]
Sep 02 21:16:24 targets-host container target101[2702]: [  OK  ] Reached target User and Group Name Lookups.
Sep 02 21:16:24 targets-host container target101[2702]:          Starting Login Service...
Sep 02 21:16:24 targets-host container target101[2702]: [  OK  ] Reached target Host and Network Name Lookups.
Sep 02 21:16:27 targets-host container target101[2702]: [  OK  ] Started Login Service.
Sep 02 21:16:31 targets-host container target101[2702]: [  OK  ] Started Networking Setup.
Sep 02 21:16:32 targets-host container target101[2702]:          Starting Extra networking commands....
Sep 02 21:16:33 targets-host container target101[2702]: [  OK  ] Started Extra networking commands..
Sep 02 21:17:01 targets-host container target101[2702]: [3.0K blob data]
Sep 02 21:17:01 targets-host container target101[2702]: [  OK  ] Reached target Network.
Sep 02 21:17:02 targets-host container target101[2702]:          Starting Nginx Web Server...
Sep 02 21:17:02 targets-host container target101[2702]:          Starting Dnsmasq Daemon...
Sep 02 21:17:02 targets-host container target101[2702]:          Starting Permit User Sessions...
Sep 02 21:17:02 targets-host container target101[2702]: [  OK  ] Reached target Network is Online.
Sep 02 21:17:03 targets-host container target101[2702]: [FAILED] Failed to start Permit User Sessions.
Sep 02 21:17:03 targets-host container target101[2702]: See 'systemctl status systemd-user-sessions.service' for details.
Sep 02 21:17:03 targets-host container target101[2702]: [  OK  ] Started Console Getty.
Sep 02 21:17:03 targets-host container target101[2702]: [  OK  ] Reached target Login Prompts.
Sep 02 21:17:03 targets-host container target101[2702]: [FAILED] Failed to start Nginx Web Server.
Sep 02 21:17:03 targets-host container target101[2702]: See 'systemctl status nginx.service' for details.
Sep 02 21:17:10 targets-host container target101[89675]: /nix/store/cinw572b38aln37glr0zb8lxwrgaffl4-bash-4.4-p23/bin/bash: /nix/store/nh6qsmg2vyzpyf3sykgr9m2dnblcp42m-unit-script-container_target101-post-start: Too many open files
Sep 02 21:17:10 targets-host systemd[1]: container@target101.service: Control process exited, code=exited status=1
Sep 02 21:17:10 targets-host container target101[2702]: [2B blob data]
Sep 02 21:17:10 targets-host container target101[2702]: [1B blob data]
Sep 02 21:17:10 targets-host container target101[2702]: <<< Welcome to NixOS 19.03.173391.0715f2f1a9b (x86_64) - console >>>
Sep 02 21:17:10 targets-host container target101[2702]: [1B blob data]
Sep 02 21:18:40 targets-host systemd[1]: container@target101.service: State 'stop-sigterm' timed out. Killing.
Sep 02 21:18:40 targets-host systemd[1]: container@target101.service: Killing process 2702 (systemd-nspawn) with signal SIGKILL.
Sep 02 21:18:40 targets-host systemd[1]: container@target101.service: Killing process 4304 (systemd) with signal SIGKILL.
Sep 02 21:18:40 targets-host systemd[1]: container@target101.service: Killing process 61039 (systemd-journal) with signal SIGKILL.
Sep 02 21:18:40 targets-host systemd[1]: container@target101.service: Killing process 69063 (nscd) with signal SIGKILL.
Sep 02 21:18:40 targets-host systemd[1]: container@target101.service: Killing process 87303 (dhcpcd) with signal SIGKILL.
Sep 02 21:18:40 targets-host systemd[1]: container@target101.service: Killing process 70447 (dbus-daemon) with signal SIGKILL.
Sep 02 21:18:40 targets-host systemd[1]: container@target101.service: Killing process 73971 (systemd-logind) with signal SIGKILL.
Sep 02 21:18:40 targets-host systemd[1]: container@target101.service: Killing process 87899 (agetty) with signal SIGKILL.
Sep 02 21:18:40 targets-host container target101[2702]: target101 login:
Sep 02 21:18:40 targets-host systemd[1]: container@target101.service: Main process exited, code=killed, status=9/KILL
Sep 02 21:18:40 targets-host systemd[1]: container@target101.service: Failed with result 'exit-code'.
Sep 02 21:18:40 targets-host systemd[1]: Failed to start Container 'target101'.
Sep 02 21:18:40 targets-host systemd[1]: container@target101.service: Consumed 10.140s CPU time
Sep 02 21:18:40 targets-host systemd[1]: container@target101.service: Service RestartSec=100ms expired, scheduling restart.
Sep 02 21:18:40 targets-host systemd[1]: container@target101.service: Scheduled restart job, restart counter is at 1.
Sep 02 21:18:40 targets-host systemd[1]: Stopped Container 'target101'.
Sep 02 21:18:40 targets-host systemd[1]: container@target101.service: Consumed 10.143s CPU time
Sep 02 21:18:40 targets-host systemd[1]: Starting Container 'target101'...
Sep 02 21:18:41 targets-host container target101[100753]: Spawning container target101 on /var/lib/containers/target101.
Sep 02 21:18:41 targets-host container target101[100753]: Press ^] three times within 1s to kill container.
Sep 02 21:18:41 targets-host container target101[100753]: Failed to register machine: Machine 'target101' already exists
Sep 02 21:18:41 targets-host container target101[100753]: Parent died too early
Sep 02 21:18:42 targets-host systemd[1]: container@target101.service: Main process exited, code=exited, status=1/FAILURE
Sep 02 21:18:42 targets-host systemd[1]: container@target101.service: Failed with result 'exit-code'.
Sep 02 21:18:42 targets-host systemd[1]: Failed to start Container 'target101'.
Sep 02 21:18:42 targets-host systemd[1]: container@target101.service: Consumed 532ms CPU time
Sep 02 21:18:42 targets-host systemd[1]: container@target101.service: Service RestartSec=100ms expired, scheduling restart.
Sep 02 21:18:42 targets-host systemd[1]: container@target101.service: Scheduled restart job, restart counter is at 2.
Sep 02 21:18:42 targets-host systemd[1]: Stopped Container 'target101'.
Sep 02 21:18:42 targets-host systemd[1]: container@target101.service: Consumed 532ms CPU time
Sep 02 21:18:42 targets-host systemd[1]: Starting Container 'target101'...
Sep 02 21:18:43 targets-host container target101[101043]: Spawning container target101 on /var/lib/containers/target101.
Sep 02 21:18:43 targets-host container target101[101043]: Press ^] three times within 1s to kill container.
Sep 02 21:18:43 targets-host container target101[101043]: Failed to register machine: Machine 'target101' already exists
Sep 02 21:18:43 targets-host container target101[101043]: Parent died too early
Sep 02 21:18:44 targets-host systemd[1]: container@target101.service: Main process exited, code=exited, status=1/FAILURE
Sep 02 21:18:44 targets-host systemd[1]: container@target101.service: Failed with result 'exit-code'.
Sep 02 21:18:44 targets-host systemd[1]: Failed to start Container 'target101'.
Sep 02 21:18:44 targets-host systemd[1]: container@target101.service: Consumed 431ms CPU time
Sep 02 21:18:44 targets-host systemd[1]: container@target101.service: Service RestartSec=100ms expired, scheduling restart.
Sep 02 21:18:44 targets-host systemd[1]: container@target101.service: Scheduled restart job, restart counter is at 3.
Sep 02 21:18:44 targets-host systemd[1]: Stopped Container 'target101'.
Sep 02 21:18:44 targets-host systemd[1]: container@target101.service: Consumed 431ms CPU time

I will debug this further. For now this issue serves as note about the current state and a call for ideas what to check. Any hints welcome!

The hardware should not be a problem. It's a Workstation with Intel i9-9900K (16x 4 GHz) and 32 GB RAM.

To Reproduce Steps to reproduce the behavior:

  1. create iso and boot as described in https://gist.github.com/davidak/7d099b7ad4b23f144e4e8fed07e0d4f6
  2. see journalctl -f for errors
  3. see systemctl --failed if container units are in failed state

Expected behavior NixOS should not limit the number of containers, only hardware should.

Metadata Please run nix run nixpkgs.nix-info -c nix-info -m and paste the result.

[root@targets-host:~]# nix run nixpkgs.nix-info -c nix-info -m
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels/nixos' does not exist, ignoring
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels' does not exist, ignoring
error: file 'nixpkgs' was not found in the Nix search path (add it using $NIX_PATH or -I)

https://github.com/nix-community/nixos-generators/issues/37 ;)

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module: nixos/modules/virtualisation/containers.nix
boxofrox commented 5 years ago

...and a call for ideas what to check. Any hints welcome!

From the logs, it looks like there's a race condition with the machine name assignments and duplicates are occurring. I've no idea which code to poke at, unfortunately.

Sep 02 21:18:43 targets-host container target101[101043]: Spawning container target101 on /var/lib/containers/target101.
Sep 02 21:18:43 targets-host container target101[101043]: Press ^] three times within 1s to kill container.
Sep 02 21:18:43 targets-host container target101[101043]: Failed to register machine: Machine 'target101' already exists
boxofrox commented 5 years ago

On second glance, it appears that you may be hitting an open file limit on linux? The container is then killed, but not unregistered, hence the above error. There might be two bugs here.

Sep 02 21:17:03 targets-host container target101[2702]: [FAILED] Failed to start Nginx Web Server.
Sep 02 21:17:03 targets-host container target101[2702]: See 'systemctl status nginx.service' for details.
Sep 02 21:17:10 targets-host container target101[89675]: /nix/store/cinw572b38aln37glr0zb8lxwrgaffl4-bash-4.4-p23/bin/bash: /nix/store/nh6qsmg2vyzpyf3sykgr9m2dnblcp42m-unit-script-container_target101-post-start: Too many open files

Some reference material on file limits.

You might be able to adjust the max file limit with the boot.kernel.sysctl nixos option.

arianvp commented 5 years ago

Thanks for setting these milestones by the way. We hit new problems every time you are stretching your goals which is a good thing :)

davidak commented 4 years ago

@boxofrox thanks for the analysis!

You might be able to adjust the max file limit with the boot.kernel.sysctl nixos option.

Yes, that's easily possible with:

  boot.kernel.sysctl = {
    "fs.file-max" = 2097152;
  };

but the default is already pretty high:

[davidak@ethmoid:~]$ cat /proc/sys/fs/file-max
9223372036854775807

I tried to raise the open file limits per user in the past, but it didn't solve the problem then.

  security.pam.loginLimits = [
    { domain = "*"; item = "nofile"; type = "soft"; value = "8192"; }
    { domain = "*"; item = "nofile"; type = "hard"; value = "8192"; }
  ];

I will debug this further.

davidak commented 4 years ago

I tested with extremely high values, but still got Too many open files issues.

  # same as /proc/sys/fs/nr_open
  # maybe try also unlimited
  security.pam.loginLimits = [
    { domain = "*"; item = "nofile"; type = "soft"; value = "1073741816"; }
    { domain = "*"; item = "nofile"; type = "hard"; value = "1073741816"; }
  ];

fs.file-max not set since it's already 9223372036854775807 by default.

I have set that in the container host. I don't think i have to set anything special in the container since i have only 3 services running.

I have to look how the limits are actually for the users that having this issues. Debugging takes a lot of time i don't have right now.

arianvp commented 4 years ago

Note that systemd services set limits through

https://jlk.fjfi.cvut.cz/arch/manpages/man/systemd-system.conf.5 and explicitly ignore the values set by pam ( https://wiki.archlinux.org/index.php/Limits.conf)

On Mon, Nov 11, 2019, 09:36 David Kleuker notifications@github.com wrote:

I tested with extremely high values, but still got Too many open files issues.

same as /proc/sys/fs/nr_open

maybe try also unlimited

security.pam.loginLimits = [ { domain = ""; item = "nofile"; type = "soft"; value = "1073741816"; } { domain = ""; item = "nofile"; type = "hard"; value = "1073741816"; } ];

fs.file-max not set since it's already 9223372036854775807 by default.

I have set that in the container host. I don't think i have to set anything special in the container since i have only 3 services running.

I have to look how the limits are actually for the users that having this issues. Debugging takes a lot of time i don't have right now.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/NixOS/nixpkgs/issues/67970?email_source=notifications&email_token=AAEZNIYUGV6XMANULPSETWDQTEKRNA5CNFSM4ITANJYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDWBX2I#issuecomment-552344553, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZNI5PVAKEPPKYL44C77LQTEKRNANCNFSM4ITANJYA .

boxofrox commented 4 years ago

I took @arianvp's information and created 1) a small program print-file-limits that prints RLIMITS_NOFILE, and 2) a VM to test print-file-limits as a process and service for comparison.

I documented the test in a gist [1] in case @davidak wants to review the numbers on his environment. I didn't find a way in my tests to affect the RLIMIT_NOFILE value in a systemd service despite setting systemd.extraConfig = "DefaultLimitNOFILE=128K:512K".

I don't find many opportunities to tinker with NixOS, but this little program and test VM was surprisingly easy to set up :smiley:.

[1]: https://gist.github.com/boxofrox/eeb9cba5b25d4caad7b47a26039b1b61

arianvp commented 4 years ago

Interesting .... that sounds like a bug to me? Wonder whats up here.... Setting the NOFILE thrrough systemd should defenitely work.

How about setting LimitNOFILE directly on the systemd service? (Instead of through systemd.conf)

e.g.:

serviceConfig.LimitNOFILE=128k:512k
boxofrox commented 4 years ago

@arianvp thanks! I was looking for a per-service option for DefaultLimitNOFILE.

With that patch, I found no change in the RLIMITS_NOFILE reported inside a service. I wonder if 125K:512K is too much.

diff --git a/default.nix b/default.nix
index 0d33d63..34b21fe 100644
--- a/default.nix
+++ b/default.nix
@@ -47,6 +47,7 @@ in {
       serviceConfig = {
         Type = "oneshot";
         ExecStart = "${package}/bin/print-file-limits";
+        LimitNOFILE = "125K:1M";
       };
     };
[vm@test-nixos:~]$ journalctl -u print-file-limits.service --no-pager -e
-- Logs begin at Tue 2019-11-12 17:00:52 UTC, end at Tue 2019-11-12 18:32:40 UTC. --
Nov 12 18:32:19 test-nixos print-file-limits[554]: RLIMIT_NOFILE soft(1024) hard(524288)

[vm@test-nixos:~]$ grep NOFILE /etc/systemd/system/print-file-limits.service
LimitNOFILE=125K:1M

Edit: Doh. Might help to see a difference if my upper limit (512K) didn't match the existing value (524288). Using LimitNOFILE="125K:1M"; instead still doesn't affect the 524288 hard limit.

boxofrox commented 4 years ago

I found that systemctl show will print details about a service unit. Despite LimitNOFILE=125K:1M, systemctl show reports the same values 1024:524288 I observed.

[vm@test-nixos:~]$ systemctl show print-file-limits.service | grep NO
LimitNOFILE=524288
LimitNOFILESoft=1024

So I switched from LimitNOFILE=125K:1M to LimitNOFILE=1M. systemctl show still reports 1024:524288.

Okay, drop the units. Use LimitNOFILE=1000000. systemctl show changed and reports 1000000:1000000. And print-file-limits.services reports the same.

~It appears the bit about file units in https://jlk.fjfi.cvut.cz/arch/manpages/man/systemd-system.conf.5 is broken?~ Nevermind. Per man page "...may be used for resource limits measured in bytes."

LimitNOFILE=125000:1000000 also works.

[vm@test-nixos:~]$ journalctl -u print-file-limits.service --no-pager -e
-- Logs begin at Tue 2019-11-12 17:00:52 UTC, end at Tue 2019-11-12 20:36:04 UTC. --
Nov 12 20:35:43 test-nixos print-file-limits[518]: RLIMIT_NOFILE soft(125000) hard(1000000)

Mystery solved with LimitNOFILE. :tada:

stale[bot] commented 4 years ago

Thank you for your contributions. This has been automatically marked as stale because it has had no activity for 180 days. If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity. Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse. 3. Ask on the #nixos channel on irc.freenode.net.
davidak commented 4 years ago

still work in progress

davidak commented 4 years ago

Hello again.

I'm now able to run 250 nixos-containers! System rebuild uses 28 GB RAM. The important part of the configuration is:

  # raise limits to support many containers
  boot.kernel.sysctl = {
    # Fix "Failed to allocate directory watch: Too many open files"
    # or "Insufficent watch descriptors available."
    "fs.inotify.max_user_instances" = 524288; # max (uses up to 512 MB kernel memory)
    # Fix "Failed to add ... to directory watch: inotify watch limit reached"
    "fs.inotify.max_user_watches" = 524288; # max (uses up to 512 MB kernel memory)
    # Fix full PIDs, check with `lsof -n -l | wc -l` (default 32768)
    "kernel.pid_max" = 4194303; # 64-bit max
  };

When i try to run 300, i get this errors:

Jun 24 00:01:21 targets-host systemd[1]: Starting Container 'target296'...
Jun 24 00:01:21 targets-host dbus-daemon[770]: [system] The maximum number of active connections for UID 0 has been reached (max_connections_per_user=256)
Jun 24 00:01:21 targets-host dbus-daemon[770]: [system] The maximum number of active connections for UID 0 has been reached (max_connections_per_user=256)
...

That is a known problem in D-Bus: https://gitlab.freedesktop.org/dbus/dbus/-/issues/97

Even a nixos-rebuild fails :smile:

[root@nixos:~]# nixos-rebuild switch
building Nix...
building the system configuration...
these derivations will be built:
  /nix/store/bl14dd8symiqbqnvsy7rbg3kbnqzqsny-system-units.drv
  /nix/store/wjsz4yil7bp5fnpb9l85rnh9gy7n9138-etc.drv
  /nix/store/w7x8k22y30ism8w375nbqcdaw1aj2mhn-nixos-system-targets-host-19.09.2370.e10c65cdb35.drv
building '/nix/store/bl14dd8symiqbqnvsy7rbg3kbnqzqsny-system-units.drv'...
building '/nix/store/wjsz4yil7bp5fnpb9l85rnh9gy7n9138-etc.drv'...
building '/nix/store/w7x8k22y30ism8w375nbqcdaw1aj2mhn-nixos-system-targets-host-19.09.2370.e10c65cdb35.drv'...
org.freedesktop.DBus.Error.LimitsExceeded: The maximum number of active connections for UID 0 has been reached
warning: error(s) occurred while switching to the new configuration

Workaround: Stop containers first. for i in {1..250}; do systemctl stop container@target$i.service ; done

So we might want to limit nixos-containers to 250 for now, until this is fixed.

arianvp commented 4 years ago

Thanks for pushing the limits!

How about trying out dbus-broker instead of dbus? It should be a drop-in replacement with better performance

On Wed, Jun 24, 2020, 00:52 davidak notifications@github.com wrote:

Hello again.

I'm now able to run 250 nixos-containers! System rebuild uses 28 GB RAM. The important part of the configuration is:

raise limits to support many containers

boot.kernel.sysctl = {

# Fix "Failed to allocate directory watch: Too many open files"

# or "Insufficent watch descriptors available."

"fs.inotify.max_user_instances" = 524288; # max (uses up to 512 MB kernel memory)

# Fix "Failed to add ... to directory watch: inotify watch limit reached"

"fs.inotify.max_user_watches" = 524288; # max (uses up to 512 MB kernel memory)

# Fix full PIDs, check with `lsof -n -l | wc -l` (default 32768)

"kernel.pid_max" = 4194303; # 64-bit max

};

When i try to run 300, i get this errors:

Jun 24 00:01:21 targets-host systemd[1]: Starting Container 'target296'...

Jun 24 00:01:21 targets-host dbus-daemon[770]: [system] The maximum number of active connections for UID 0 has been reached (max_connections_per_user=256)

Jun 24 00:01:21 targets-host dbus-daemon[770]: [system] The maximum number of active connections for UID 0 has been reached (max_connections_per_user=256)

...

That is a known problem in D-Bus: https://gitlab.freedesktop.org/dbus/dbus/-/issues/97

Even a nixos-rebuild fails 😄

[root@nixos:~]# nixos-rebuild switch

building Nix...

building the system configuration...

these derivations will be built:

/nix/store/bl14dd8symiqbqnvsy7rbg3kbnqzqsny-system-units.drv

/nix/store/wjsz4yil7bp5fnpb9l85rnh9gy7n9138-etc.drv

/nix/store/w7x8k22y30ism8w375nbqcdaw1aj2mhn-nixos-system-targets-host-19.09.2370.e10c65cdb35.drv

building '/nix/store/bl14dd8symiqbqnvsy7rbg3kbnqzqsny-system-units.drv'...

building '/nix/store/wjsz4yil7bp5fnpb9l85rnh9gy7n9138-etc.drv'...

building '/nix/store/w7x8k22y30ism8w375nbqcdaw1aj2mhn-nixos-system-targets-host-19.09.2370.e10c65cdb35.drv'...

org.freedesktop.DBus.Error.LimitsExceeded: The maximum number of active connections for UID 0 has been reached

warning: error(s) occurred while switching to the new configuration

So we might want to limit nixos-containers to 250 for now or at least show a warning, until this is fixed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NixOS/nixpkgs/issues/67970#issuecomment-648468609, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZNI4T23DJEELL6ZVCT5LRYEW2TANCNFSM4ITANJYA .

nixos-discourse commented 4 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/does-nixops-support-lxc/7823/1

davidak commented 4 years ago

How about trying out dbus-broker instead of dbus?

Looks good. Sadly there is no NixOS option to do so, but a package.

So i tried this:

  systemd.services.dbus-broker.enable = true;
  systemd.services.dbus.enable = false;
  systemd.sockets.dbus.enable = false;

  environment.systemPackages = with pkgs; [ dbus-broker ];

but ended up with a broken system.

[root@nixos:~]# nixos-rebuild switch
building Nix...
building the system configuration...
these derivations will be built:
  /nix/store/a8l8r0kq3nqjlh0r4w88p3l7yr7ay0mx-system-path.drv
  /nix/store/3larzh848mi8drlsnkd7l84fxw7r05zy-dbus-1.drv
  /nix/store/1x2n6501n034pbwg5qg0i85b8irkwbca-unit-dbus.service.drv
  /nix/store/k0v8hvki318vhxb8ysfymgw99wzpvajw-user-units.drv
  /nix/store/5bmqd9fz9x2s8hfk91ynkd22l63m5dss-unit-systemd-fsck-.service.drv
  /nix/store/7vb75blaci6dnmqijx4s9drjp5f40gic-unit-dbus-broker.service.drv
  /nix/store/8i6ivh9pn6nvw1dg5bwyvbcgbjfk0ryr-unit-polkit.service.drv
  /nix/store/sivzvhy61rjv9ch81w7l27blx65l3ffr-unit-dbus.service-disabled.drv
  /nix/store/mdc6wrib8syjvd7zkk8gj0wlhji9ahd7-system-units.drv
  /nix/store/pn0j92rqbinsvrx69mbxjslzs2kmavan-etc.drv
  /nix/store/3pxc6anjd0gpag6ry0kjj09xd5g8mand-nixos-system-targets-host-19.09.2370.e10c65cdb35.drv
these paths will be fetched (0.12 MiB download, 0.40 MiB unpacked):
  /nix/store/gvnng20vlmyr47vbhy8nf6g4dyjxc31r-dbus-broker-21
copying path '/nix/store/gvnng20vlmyr47vbhy8nf6g4dyjxc31r-dbus-broker-21' from 'https://cache.nixos.org'...
building '/nix/store/7vb75blaci6dnmqijx4s9drjp5f40gic-unit-dbus-broker.service.drv'...
building '/nix/store/sivzvhy61rjv9ch81w7l27blx65l3ffr-unit-dbus.service-disabled.drv'...
building '/nix/store/a8l8r0kq3nqjlh0r4w88p3l7yr7ay0mx-system-path.drv'...
created 1654 symlinks in user environment
building '/nix/store/3larzh848mi8drlsnkd7l84fxw7r05zy-dbus-1.drv'...
building '/nix/store/8i6ivh9pn6nvw1dg5bwyvbcgbjfk0ryr-unit-polkit.service.drv'...
building '/nix/store/5bmqd9fz9x2s8hfk91ynkd22l63m5dss-unit-systemd-fsck-.service.drv'...
building '/nix/store/1x2n6501n034pbwg5qg0i85b8irkwbca-unit-dbus.service.drv'...
building '/nix/store/mdc6wrib8syjvd7zkk8gj0wlhji9ahd7-system-units.drv'...
building '/nix/store/k0v8hvki318vhxb8ysfymgw99wzpvajw-user-units.drv'...
building '/nix/store/pn0j92rqbinsvrx69mbxjslzs2kmavan-etc.drv'...
building '/nix/store/3pxc6anjd0gpag6ry0kjj09xd5g8mand-nixos-system-targets-host-19.09.2370.e10c65cdb35.drv'...
stopping the following units: dbus.service
Warning: Stopping dbus.service, but it can still be activated by:
  dbus.socket
NOT restarting the following changed units: systemd-fsck@dev-disk-by\x2duuid-A617\x2dA4CC.service
activating the configuration...
setting up /etc...
setting up tmpfiles
org.freedesktop.DBus.Error.Disconnected: Connection was disconnected before a reply was received
warning: error(s) occurred while switching to the new configuration

[root@targets-host:~]# nixos-rebuild switch
building Nix...
building the system configuration...
org.freedesktop.DBus.Error.NoServer: Failed to connect to socket /run/dbus/system_bus_socket: Connection refused
warning: error(s) occurred while switching to the new configuration

The dbus-broker.service looks like this:

[Unit]

[Service]
Environment="LOCALE_ARCHIVE=/nix/store/nl67flma20ixa0x5jms4wk0yfbx4c9wb-glibc-locales-2.27/lib/locale/locale-archive"
Environment="PATH=/nix/store/9v78r3afqy9xn9zwdj9wfys6sk3vc01d-coreutils-8.31/bin:/nix/store/0zdsw4qdrwi41mfdwqpxknsvk9fz3gkb-findutils-4.7.0/bin:/nix/store/71y5ddyz8vmsw9wgi3gzifcls53r60i9-gnugrep-3.3/bin:/nix/store/g2h4491kab7l06v9rf1lnyjvzdwy5ak0-gnused-4.7/bin:/nix/store/ib5p1wc9969rr09xpv09x2iavpaj0j0b-systemd-243.7/bin:/nix/store/9v78r3afqy9xn9zwdj9wfys6sk3vc01d-coreutils-8.31/sbin:/nix/store/0zdsw4qdrwi41mfdwqpxknsvk9fz3gkb-findutils-4.7.0/sbin:/nix/store/71y5ddyz8vmsw9wgi3gzifcls53r60i9-gnugrep-3.3/sbin:/nix/store/g2h4491kab7l06v9rf1lnyjvzdwy5ak0-gnused-4.7/sbin:/nix/store/ib5p1wc9969rr09xpv09x2iavpaj0j0b-systemd-243.7/sbin"
Environment="TZDIR=/nix/store/yfd0qkf8m908j523xyvwmwrll95ywkdi-tzdata-2019b/share/zoneinfo"

While the dbus.service has actually commands to start:

[Unit]
Description=D-Bus System Message Bus
Documentation=man:dbus-daemon(1)
Requires=dbus.socket

[Service]
ExecStart=/nix/store/f70c0ln8hj7jr7lps2ydcx4izbffh64x-dbus-1.12.16/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
ExecReload=/nix/store/f70c0ln8hj7jr7lps2ydcx4izbffh64x-dbus-1.12.16/bin/dbus-send --print-reply --system --type=method_call --dest=org.freedesktop.DBus / org.freedesktop.DBus.ReloadConfig
OOMScoreAdjust=-900
arianvp commented 4 years ago

Try adding:

systemd.packages = [ pkgs.dbus-broker ];

Also do not add

systemd.sockets.dbus.enable = false;

as you need the socket for things to work.

so:

systemd.packages = [ pkgs.dbus-broker ];
systemd.services.dbus-broker.enable = true;
systemd.user.services.dbus-broker.enable = true;
systemd.services.dbus.enable = false;
systemd.user.services.dbus.enable = false;

should do the job

arianvp commented 3 years ago

Hey @davidak i'm cleaning up some systemd issues in nixpkgs. At this point I do not see something actionable here per se (except experimenting with dbus-broker which is a noble goal on its own but might be something that would have its own standalone ticket) and we reached our goal of 200 containers =)

Do you want to document your tweaks in the documentation for other people who want to run many systemd-nspawn containers?

davidak commented 3 years ago

Yes, i want to at least document it or better add it to nixpkgs somehow, so it just works.

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

gdamjan commented 2 years ago

To configure dbus-broker in systemd, you need to alias dbus.service

systemd.packages = [ pkgs.dbus-broker ];
systemd.services.dbus-broker.aliases = [ "dbus.service" ];
systemd.user.services.dbus-broker.aliases = [ "dbus.service" ];

These ^, would properly alias dbus.service to dbus-broker. But don't add the …enable = false; items those would mask dbus.service.

Unfortunately, dbus-broker fails to start with:

Apr 24 22:15:10 nixos dbus-broker-launch[183]: Missing configuration file in /usr/share/dbus-1/system.conf +1: /usr/share/dbus-1/system.conf

I guess the .service unit file needs to be patched to use --config-file=/etc/dbus-1/system.conf.