Open bt90 opened 8 months ago
The limit got raised to a generous value of 512
in https://github.com/caddyserver/caddy/pull/1825 in order to solve https://github.com/caddyserver/caddy/issues/1723. But it's still possible to hit the limit due to misattribution of other container processes with the same UID.
The TasksMax
option solves this by only limiting the number of processes started as part of the service, which is what we actually want to achieve.
I'm confused. Why would you be running Docker using the caddy
user?
Anyway this seems to make sense on paper, but I'd like @carlwgeorge to review as well.
The problem is that the numeric UID of the caddy user happens to overlap with one or more users inside Docker containers.
On my host, the caddy
user has the UID 999
. If the user in a container happens to have the same UID, systemd would attribute those processes to the caddy user and conclude that the limit has been reached.
The limit seems to include threads as explained in setrlimit(2):
The maximum number of processes (or, more precisely on Linux, threads) that can be created for the real user ID of the calling process. Upon encountering this limit, fork(2) fails with the error EAGAIN.
I used the following bash script to determine the current value:
sudo ps -U caddy -h -o nlwp | awk '{total += $1} END {print total}'
This yields 749
on my system with all services and containers running. LimitNPROC=800
works, but the unit fails to start once I switch to LimitNPROC=700
.
The task limit is much more reliable and better reflects reality:
● caddy.service - Caddy
Loaded: loaded (/etc/systemd/system/caddy.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2023-12-28 20:45:05 CET; 8min ago
Docs: https://caddyserver.com/docs/
Main PID: 1885185 (caddy)
Tasks: 8 (limit: 512)
Memory: 12.7M
CPU: 413ms
CGroup: /system.slice/caddy.service
└─1885185 /usr/bin/caddy run --environ --config /etc/caddy/Caddyfile
1 parent process + 7 children -> 8 tasks
The main offender in my case is the browserless/chrome container.
docker top playwright-chrome o user,uid,pid
USER UID PID
caddy 999 2654461
caddy 999 2654495
caddy 999 2654496
caddy 999 2654497
caddy 999 2654671
caddy 999 2654672
caddy 999 2654673
caddy 999 2654674
caddy 999 2654675
caddy 999 2654676
caddy 999 2654677
caddy 999 2654678
caddy 999 2654683
caddy 999 2654684
caddy 999 2654685
caddy 999 2654686
caddy 999 2654691
caddy 999 2654692
caddy 999 2654693
caddy 999 2654694
caddy 999 2654703
caddy 999 2654704
caddy 999 2654705
caddy 999 2654706
caddy 999 2654707
caddy 999 2654708
caddy 999 2654709
caddy 999 2654710
caddy 999 2654711
caddy 999 2654712
caddy 999 2654715
caddy 999 2654717
caddy 999 2654718
The container alone is enough to trip the limitation:
docker top playwright-chrome -o nlwp,pid | tail -n +2 | awk '{total += $1} END {print total}'
yields 717
.
@francislavoie the following systemd documentation PR describes the situation very good: https://github.com/systemd/systemd/pull/23242
@francislavoie Sorry for my delay in getting back to you on this. Adoption of this option should wait until the project is ready to abandon building for RHEL 7. TasksMax
was added in systemd 227, but RHEL 7 only has systemd 219. RHEL 8 bumps up to systemd 239. RHEL 7 goes EOL on 2024-06-30, so that may the ideal time to switch to TasksMax
.
Thanks @carlwgeorge, glad I waited.
Would it be okay if we dropped RHEL 7 support early then? It just would mean it wouldn't receive this one last release before official EOL I guess.
Does COPR have recent download stats that would give us an idea how much it's still being used?
Note that users would still able to work around it using
systemctl edit caddy
.
Yeah, understood. I just rather not block merging this for everyone else who would benefit, while waiting for one old distribution to cycle out.
Would it be okay if we dropped RHEL 7 support early then? It just would mean it wouldn't receive this one last release before official EOL I guess.
I don't have a problem with the project dropping support for RHEL 7 early, I just would like to be an explicit decision, not a "whoops". Doing it intentionally would also be less disruptive for RHEL 7 caddy users, as we wouldn't ship an update to them with an incompatible option. Ideally we would inform them with some kind of announcement that there will be no more RHEL 7 caddy updates.
Does COPR have recent download stats that would give us an idea how much it's still being used?
RHEL 7 and it's derivatives are still pretty popular, more so than they should be this late into their lifecycles. Here are the download stats from COPR.
Note that users would still able to work around it using
systemctl edit caddy
.
Just like people that want to start using TasksMax
now can, before the project makes it the default. With the broad range of systemd versions in the wild, it makes more sense to keep the default unit using directives that are available on all distributions that the project targets with the apt and rpm repos.
We could also simply drop LimitNProc
to be honest.
EPEL 7: 16,491
Oof, yeah that's not as low as I'd hoped.
We could also simply drop
LimitNProc
to be honest.
Yeah, I'd be okay with that too in the short term.
I don't really think we need to worry about Go running wild, it's a well behaving runtime.
Small update, RHEL 7 is now EOL, and we already skipped building caddy 2.8 for it a month before the EOL date. We could add TasksMax
to the unit file now if people still think it's needed. I don't have a strong opinion on whether we need it or not, I just wanted to mention that it's a viable option now as it's available on all the distro versions we're targeting in copr.
I'm hitting the same problem as outlined in https://github.com/caddyserver/caddy/issues/1802. The culprit seems to be how systemd handles the
LimitNProc
option:https://github.com/caddyserver/dist/blob/49a805b0196e8c9e394cfe3546f2cd568d6e37d1/init/caddy.service#L30
While caddy doesn't occupy that many processes, some other docker containers seem to use the same UID for their processes:
The systemd documentation notes that
TasksMax
should be preferred overLimitNProc
:https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#Process%20Properties