Closed danmcd closed 4 months ago
@smokris ==> If you can, please try the updated fix in your environment? I think I reproduced your failure-then-success case with a Debian 12 LX zone, but I would really appreciate if you could re-reproduce it to make sure my additional corner-case covering didn't break anything?
Some additional testing notes are in the ticket.
Updated test results on the OS-8543 ticket.
@smokris can you please again try this branch in your environment?
Thanks for your work on this, @danmcd.
If I understand correctly, your changes (compared to my initial PR) are:
I read through your changes (compared to my initial PR) and I think they make sense (to the limited extent that I understand the 32/64-bit compatibility layer).
I made a local PI build from your OS-8543
branch and re-ran my test from https://github.com/TritonDataCenter/illumos-joyent/pull/467, and the test passes (that is, systemd-sysusers
and apt upgrade
both complete successfully). LGTM.
I made a local PI build from your
OS-8543
branch and re-ran my test from #467, and the test passes (that is,systemd-sysusers
andapt upgrade
both complete successfully). LGTM.
Two favors:
1.) Please check a reboot of your zone to see whether or not systemd is running after your "apt upgrade" or "systemd-sysusers"?
2.) If your LGTM stands, please +1 this review so it's on the record. If, for permissions reasons, you cannot, I will note you as a reviewer in the commit message along with either you as author or you as portions-contributed-by.
More testing in bug ticket.
@TritonDataCenter/triton-engineering ==> Any objections for this landing in 20240530 ?
Due to the details of how this was reviewed, I'm going to push from the outside:
commit 3c71165421e1b0fc36533828e5f2369cfff91f53 (HEAD -> master)
Author: Steve Mokris <steve@kosada.com>
Date: Mon Apr 22 15:15:09 2024 -0400
OS-8543 Want support for OFD locks in lx-brand
Portions-contributed-by: Dan McDonald <danmcd@mnx.io>
Reviewed by: Andy Fiddaman <illumos@fiddaman.net>
Approved by: Brian Bennett <brbennett@mnx.io>
Pushed.
Nice! I'm a few minutes late, but here's my testing report:
While testing the below, I ran a dtrace probe for sdt:lx_brand::brand-lx-unsupported
, and it caught two instances where LX processes attempted to do an unsupported partial-file OFD lock (details below).
Please check a reboot of your zone to see whether or not systemd is running after your "apt upgrade" or "systemd-sysusers"?
After rebooting the zone, systemd
is running as the init process as expected.
# ps -q 1
PID TTY TIME CMD
1 ? 00:00:36 systemd
# ls -l /proc/1/exe
lrwxrwxrwx 1 root root 0 May 28 19:21 /proc/1/exe -> /usr/lib/systemd/systemd
# systemctl status
● 3c03fcdf-9941-4bd9-8610-fd56e805b6db
State: degraded
Units: 181 loaded (incl. loaded aliases)
Jobs: 0 queued
Failed: 5 units
Since: Tue 2024-05-28 19:21:46 UTC; 7min ago
systemd: 254.5-1~bpo12+3
…
(The 5 failed units are an existing/separate problem unrelated to this PR.)
I searched the systemd source code and I think I found all the places it uses OFD locks:
systemd-sysusers
CLI — I tested it by running systemd-sysusers ; echo $?
; it completed successfullysystemd-firstboot
CLI — I tested it by running systemd-firstboot --prompt --reset ; echo $?
; it completed successfullyDynamicUser=true
— can't test since it implies PrivateTmp=true
, which requires Linux kernel namespaces, which LX doesn't currently supportsystemd-nspawn
— can't test since it requires Linux kernel namespaces, which LX doesn't currently supportsystemd-sysext
— can't test since it requires overlayfs, which LX doesn't currently supportSo, I don't think this PR makes systemd support worse than it was before.
I searched for other software packages that use OFD locks:
bazel
— successfully launched and ran a build. During that test run, the dtrace probe caught the following:
1 5971 lx_unsupported:brand-lx-unsupported 28294 bazel-4 F_OFD_* fcntl being sent with one or more nonzero whence (0), start (0), or len (4096).\0
Bazel successfully fell back to using F_SETLK
.
ceph
— can't test since it requires overlayfs
xapian
— successfully launched and created a database. During that test run, the dtrace probe caught the following:
1 5971 lx_unsupported:brand-lx-unsupported 21115 python3.11 F_OFD_* fcntl being sent with one or more nonzero whence (0), start (0), or len (1).\0
Xapian successfully fell back to using O_CLOEXEC
.
I also tested some other software packages which my team commonly uses on LX (I don't know offhand whether these packages use OFD locks), and performed some basic testing of each:
apache2
— successfully launched and performed a simple queryelasticsearch
— successfully launched and performed a simple queryfacter
— successfully generated default factsmariadb-server
— successfully launched and performed a simple querynodejs
— successfully executed some statementsopenjdk-17-jdk
— successfully executed some statementsopenssh-server
— successfully launched and connected remotelyperl
— successfully executed some statementsphp-fpm
— successfully launched and performed a simple querypostgresql
— successfully launched and performed a simple querypython
— successfully executed some statementsruby
— successfully executed some statementsIf your LGTM stands, please +1 this review so it's on the record.
I haven't found any regressions introduced by this change. My LGTM stands.
THANK YOU SO MUCH for that extra test data, which is now copied into the OS-8543 ticket as well. This will be in this week's 20240530 ("Spider-Man") release.
This was originally part of #467 , but because I'm taking it over, I need to reset this PR.