TritonDataCenter / illumos-joyent

Community developed and maintained version of the OS/Net consolidation
http://www.illumos.org/projects/illumos-gate
266 stars 109 forks source link

OS-8543 Want support for OFD locks in lx-brand #471

Closed danmcd closed 4 months ago

danmcd commented 4 months ago

This was originally part of #467 , but because I'm taking it over, I need to reset this PR.

danmcd commented 4 months ago

@smokris ==> If you can, please try the updated fix in your environment? I think I reproduced your failure-then-success case with a Debian 12 LX zone, but I would really appreciate if you could re-reproduce it to make sure my additional corner-case covering didn't break anything?

danmcd commented 4 months ago

Some additional testing notes are in the ticket.

danmcd commented 4 months ago

Updated test results on the OS-8543 ticket.

danmcd commented 4 months ago

@smokris can you please again try this branch in your environment?

smokris commented 4 months ago

Thanks for your work on this, @danmcd.

If I understand correctly, your changes (compared to my initial PR) are:

I read through your changes (compared to my initial PR) and I think they make sense (to the limited extent that I understand the 32/64-bit compatibility layer).

I made a local PI build from your OS-8543 branch and re-ran my test from https://github.com/TritonDataCenter/illumos-joyent/pull/467, and the test passes (that is, systemd-sysusers and apt upgrade both complete successfully). LGTM.

danmcd commented 4 months ago

I made a local PI build from your OS-8543 branch and re-ran my test from #467, and the test passes (that is, systemd-sysusers and apt upgrade both complete successfully). LGTM.

Two favors:

1.) Please check a reboot of your zone to see whether or not systemd is running after your "apt upgrade" or "systemd-sysusers"?

2.) If your LGTM stands, please +1 this review so it's on the record. If, for permissions reasons, you cannot, I will note you as a reviewer in the commit message along with either you as author or you as portions-contributed-by.

danmcd commented 4 months ago

More testing in bug ticket.

danmcd commented 4 months ago

@TritonDataCenter/triton-engineering ==> Any objections for this landing in 20240530 ?

danmcd commented 4 months ago

Due to the details of how this was reviewed, I'm going to push from the outside:

commit 3c71165421e1b0fc36533828e5f2369cfff91f53 (HEAD -> master)
Author: Steve Mokris <steve@kosada.com>
Date:   Mon Apr 22 15:15:09 2024 -0400

    OS-8543 Want support for OFD locks in lx-brand
    Portions-contributed-by: Dan McDonald <danmcd@mnx.io>
    Reviewed by: Andy Fiddaman <illumos@fiddaman.net>
    Approved by: Brian Bennett <brbennett@mnx.io>
danmcd commented 4 months ago

Pushed.

smokris commented 4 months ago

Nice! I'm a few minutes late, but here's my testing report:

While testing the below, I ran a dtrace probe for sdt:lx_brand::brand-lx-unsupported, and it caught two instances where LX processes attempted to do an unsupported partial-file OFD lock (details below).

Testing systemd

Please check a reboot of your zone to see whether or not systemd is running after your "apt upgrade" or "systemd-sysusers"?

After rebooting the zone, systemd is running as the init process as expected.

# ps -q 1
  PID TTY          TIME CMD
    1 ?        00:00:36 systemd
# ls -l /proc/1/exe
lrwxrwxrwx 1 root root 0 May 28 19:21 /proc/1/exe -> /usr/lib/systemd/systemd
# systemctl status
● 3c03fcdf-9941-4bd9-8610-fd56e805b6db
    State: degraded
    Units: 181 loaded (incl. loaded aliases)
     Jobs: 0 queued
   Failed: 5 units
    Since: Tue 2024-05-28 19:21:46 UTC; 7min ago
  systemd: 254.5-1~bpo12+3
…

(The 5 failed units are an existing/separate problem unrelated to this PR.)

I searched the systemd source code and I think I found all the places it uses OFD locks:

So, I don't think this PR makes systemd support worse than it was before.

Testing other software packages

I searched for other software packages that use OFD locks:

I also tested some other software packages which my team commonly uses on LX (I don't know offhand whether these packages use OFD locks), and performed some basic testing of each:

Still LGTM

If your LGTM stands, please +1 this review so it's on the record.

I haven't found any regressions introduced by this change. My LGTM stands.

danmcd commented 4 months ago

THANK YOU SO MUCH for that extra test data, which is now copied into the OS-8543 ticket as well. This will be in this week's 20240530 ("Spider-Man") release.