bedrocklinux / bedrocklinux-userland

This tracks development for the things such as scripts and (defaults for) config files for Bedrock Linux
https://bedrocklinux.org
GNU General Public License v2.0
597 stars 62 forks source link

systemctl --system list-unit-files Connection timed out #263

Open HeavenVolkoff opened 2 years ago

HeavenVolkoff commented 2 years ago

Hello,

Currently, I am running a hijacked Fedora 36 with an arch strata, and I am experiencing the following issue:

$> systemctl --system list-unit-files
Failed to list unit files: Connection timed out

I first observed this when attempting to use the following Gnome extension: https://github.com/hardpixel/systemd-manager

Which internally makes a call to systemctl: https://github.com/hardpixel/systemd-manager/blob/38fa5702c5bb248f663416ee28e8bba8d7632c63/systemd-manager%40hardpixel.eu/utils.js#L34-L44

The extension was misbehaving due to this command failing to execute

As far as I could investigate, this seems to be a problem with the way bedrock mounts its cross directories, but I may be wrong, as this is my first time using this distro and this is beyond my knowledge on how it internally sets up its environment. However, running the command in a normal Fedora 35 and another Arch deploys, it executes correctly.

brl report output: brl-report-log.txt

strace -vvv systemctl --system list-unit-files output: strace.txt

paradigm commented 2 years ago

It's not clear to me what you have in mind with:

As far as I could investigate, this seems to be a problem with the way bedrock mounts its cross directories

I don't see any obvious leads from the provided logs. From strace, it looks like the timeout is from trying to communicate with dbus over /run/dbus/system_bus_socket. From both the brl fetch and strace logs, it looks like both your init (and thus almost certainly also dbus) and the systemctl commands are coming from the fedora stratum such that there's no special cross-stratum handling of how they communicate in play. I don't see anything Bedrock-specific around the point of failure.

That having been said, the fact this issue occurs in a hijacked Fedora but not native Fedora does seem to indicate Bedrock is likely a factor, just in a way that's not less obvious. I'll try to reproduce the issue when I have a moment.

HeavenVolkoff commented 2 years ago

Thanks for the quick reply

It's not clear to me what you have in mind with:

As far as I could investigate, this seems to be a problem with the way bedrock mounts its cross directories

I don't see any obvious leads from the provided logs.

Sorry, what I meant to say was that I had a suspicion that this may be related to bedrock, due to the way it mounts its cross directories between stratas and considering that a non hijacked Fedora doesn't exhibit this issue.

However, some evidence that may indicate this is indeed related to bedrock, is that during the execution of systemctl bedrock's etcfs process would consume something north of 70 to 90% of a CPU core, and after the command fails with a connection timeout it would keep running like that for a minute or two, before returning to normal.

Before running the command: photo_2022-04-22_21-41-01

While the command was running: photo_2022-04-22_21-41-25

After the command failed: photo_2022-04-22_21-41-31

I wonder if there is some way to increase the timeout for the systemd dbus connection to see if the command would complete after some time?

This seems to be the code that fails in systemctl: https://github.com/systemd/systemd/blob/23484e120531c1013009e2b3c29e8a82e6a10f4c/src/systemctl/systemctl-list-unit-files.c#L213-L228

paradigm commented 2 years ago

Thanks for the quick reply

You're welcome :)

Sorry, what I meant to say was that I had a suspicion that this may be related to bedrock, due to the way it mounts its cross directories between stratas and considering that a non hijacked Fedora doesn't exhibit this issue.

Gotcha, makes sense.

However, some evidence that may indicate this is indeed related to bedrock, is that during the execution of systemctl bedrock's etcfs process would consume something north of 70 to 90% of a CPU core, and after the command fails with a connection timeout it would keep running like that for a minute or two, before returning to normal.

Very interesting finding. etcfs is idle when nothing is happening; all of its CPU time is spent doing things on behalf of other programs making system calls to /etc. If etcfs is spiking, that means something else is spamming requests to /etc. It's possible that what's happening here is the systemctl call results in something (maybe PID 1?) making tons of requests to /etc - so many that Bedrock's normally negligible per-system call /etc overhead adds up to the point where it exceeds a short timeout.

Some options we could pursue here:

If the issue is just that etcfs's normally negligible overhead is problematic in this pessimistic case, there may not be much we can do to improve the situation by improving 0.7's business logic. 0.8 plans include a complete rewrite of the corresponding subsystem which should not only be faster, but include options for further special-case optimization like pre-read/caching whatever the set of files in question is. However, it'll be a while before that's ready.

I wonder if there is some way to increase the timeout for the systemd dbus connection to see if the command would complete after some time?

I don't know systemd very deeply, but if that's an option we can leverage that'd be great!

[^1]: If you set debug = etcfs at the very bottom of /bedrock/etc/bedrock.conf, the next time any given stratum is (re)started you'll find a corresponding live etcfs log in /bedrock/var/cache/etcfs-<stratum>/. These are verbose and kind of a pain to wade through, but the information about what is spamming /etc should be there. This debug setting also slows down etcfs further. Once you've gathered whatever is needed, consider disabling the debug = value and restarting your strata.

[^2]: I have plans to make it much easier to debug the equivalent in Bedrock 0.8, but it'll be a bit before that's ready.

HeavenVolkoff commented 2 years ago

Thanks for the debug instructions.

Some options we could pursue here:

* `strace -p 1` when making the `systemctl` call to see if that triggers a _ton_ of `/etc` calls.

* `strace -p $(pidof dbus-daemon)` when making the `systemctl` call to see if that highlights whatever component `systemctl` is talking to.

* Enabling `etcfs` debug and looking at what `etcfs` thinks is making tons of `/etc` calls. [1](#user-content-fn-1-54ae73cd42d306cc4489914dc5e28694) [2](#user-content-fn-2-54ae73cd42d306cc4489914dc5e28694)

From what I could understand from strace -p 1 and etcfs debug output, it seems the problem really boils down to the systemctl call resulting in a very large amount of requests to etcfs and timing out before it has time to finish.

etcfs-fedora.log.tar.gz strace-init.txt.tar.gz

paradigm commented 2 years ago

Thanks for gathering those logs! This does look like we found it:

The cause for the discrepancy there isn't obvious to me, but either way - tens of thousands or hundreds of thousands - that's a lot more than Bedrock's etcfs was designed to handle. The per-system call overhead is reasonably small; if you do something like read a file from /etc and you do the exact same thing to an otherwise identical from elsewhere (e.g. /tmp) the overhead is difficult to detect:

$ cp /etc/os-release /tmp
$ time cat /etc/os-release >/dev/null 2>&1
cat /etc/os-release > /dev/null 2>&1  0.00s user 0.00s system 46% cpu 0.002 total
$ time cat /tmp/os-release >/dev/null 2>&1
cat /tmp/os-release > /dev/null 2>&1  0.00s user 0.00s system 44% cpu 0.002 total

However, it will certainly add up over tens or hundreds of thousands of calls to something noticeable.

As a hack, try running:

strat fedora mount --bind /bedrock/strata/fedora/etc/systemd /etc/systemd

This will bypass etcfs for specifically Fedora's instance of the /etc/systemd directory. At the time of writing, that's probably fine; Bedrock doesn't need to do anything fancy in /etc/systemd at the moment. My hope is this will unblock your systemd-manager issues. Do let me know if this is sufficient for your immediate needs here.

Even if this works for you here and now, I'm not sure this will be viable for Bedrock as a general long term solution. I eventually want Bedrock to support cross-stratum services, i.e. one stratum's service manager (e.g. systemd) should be able to manage another's services. Making it so installing and enabling a runit service in a Void stratum automatically makes it accessible to and enables it with a running systemd may require FUSE magic to make it appear in /etc/systemd. Even if Naga is significantly faster here, fundamentally I'm not sure a theoretically ideal FUSE mount won't collapse under this kind of access pattern. Sadly I don't have an alternative strategy to get Bedrock to do its thing which doesn't involve a FUSE mount on /etc. In the long term, this might not be something Bedrock can handle from its end.

I'm inclined to bring this to the systemd folks. While I haven't had the chance to try it yet, based on your findings I suspect systemctl --system list-unit-files is noticeably slow on most distros, at least on older hardware. Rewriting systemd so that one systemctl --system list-unit-files doesn't open and close the same exact /etc/systemd/system directory 775 times will probably improve things everywhere, not just on Bedrock, even if we're hit hardest by it.