Open aplanas opened 1 year ago
This approach is using macros to share the code, but maybe is cleaner if the function is split in two: one for utmp and other for logind. Is there any preference?
Mmmm... what a tricky problem. In order to reproduce the problem I've tried setting the system time to year 2050, then calling psutil.users()
, but it didn't work. Do you have a way to reproduce it?
~/svn/psutil {master}$ sudo date -s "2050-11-22"
[sudo] password for giampaolo:
mar 22 nov 2050, 00:00:00, CST
~/svn/psutil {master}$ date
mar 22 nov 2050, 00:00:01, CST
~/svn/psutil {master}$ python3 -c "import psutil, datetime; print(datetime.datetime.fromtimestamp(psutil.users()[0].started))"
2023-08-22 22:12:56
How about the other platforms? Are they affected as well?
setutxent() / getutxent() / endutxent()
: https://github.com/giampaolo/psutil/blob/77e5b7445748d30d22c0e3b2e651414da96a88b4/psutil/_psutil_aix.c#L484-L485setutent(), getutent(), endutent()
(so different APIs): https://github.com/giampaolo/psutil/blob/77e5b7445748d30d22c0e3b2e651414da96a88b4/psutil/_psutil_linux.c#L377Mmmm... what a tricky problem. In order to reproduce the problem I've tried setting the system time to year 2050, then calling
psutil.users()
, but it didn't work. Do you have a way to reproduce it?
Uhm ... try to set the date and reboot. IIUC the issue is when the ut_tv
struct
gets created. There is a bit more info here: https://www.thkukuk.de/blog/Y2038_glibc_utmp_64bit/
As you can see the ut_tv
field tv_sec
is then declared as int32_t
, so it is signed. This can store 2**31
seconds, so around 68 years (hence 1970 + 68 = 2038). The issue should be present when it is created, not when it is compared then.
How about the other platforms? Are they affected as well?
It is a glibc issue, so that depends if they are using the same struct with int32_t
or using the struct timeval ut_tv;
one. I do not know : (
* On Linux we use `setutent(), getutent(), endutent()` (so different APIs): https://github.com/giampaolo/psutil/blob/77e5b7445748d30d22c0e3b2e651414da96a88b4/psutil/_psutil_linux.c#L377
That is the one that I fixed here! : )
For the rest of the platforms, if they are affected (not sure about that), this solution is not applicable as it depends on systemd.
Some more documentation why this is necessary: https://www.thkukuk.de/blog/Y2038_glibc_utmp_64bit/
So for linux using systemd, coreutils. procps-ng, shadow, util-linux and more already implemented support to use logind instead of utmp or accepted PRs for this. glibc developers don't plan to fix the problem but instead they want to remove support for utmp/wtmp. The situation for other OS and libraries may be different, musl libc even did never support utmp, others are 64bit clean and thus Y2038 safe.
If binaries / wheels are produced with systemd support, but the Linux system installing the wheel doesn't have systemd, are we sure that this patch won't cause a "sd_get_sessions symbol not found"
error on psutil import?
I remember we had a similar problem with prlimit()
syscall on Linux. We used #ifdef
s in the C code to check whether prlimit()
was available at compile time, but on some old systems (CentOS) it wasn't, so the resulting .whl could be installed but crashed with "prlimit symbol not found" on psutil import (see code). For this reason I rewrote it by using ctypes (the only part of psutil using ctypes), so that the check for prlimit()
availability is done at runtime instead of compile time: https://github.com/giampaolo/psutil/pull/1879/.
I'm not sure if the same problem may arise here as well. Are there Linux distros without systemd support and / or where sd_get_sessions()
is not available? Is this even a problem?
If binaries / wheels are produced with systemd support, but the Linux system installing the wheel doesn't have systemd, are we sure that this patch won't cause a
"sd_get_sessions symbol not found"
error on psutil import?
It will produce this error indeed. One solution can be what you describes:
Would this a preferred approach?
Yes, ctypes sounds like the safest bet at the time of writing, but it has a cost: slower speed + extra and convoluted code to maintain (I am also thinking about the dual unit tests).
I am skeptical how much of a problem this is in reality. Who sets the system date to >= 2035? If done, it looks like a user error. After setting the system date to 2050 all my browser tabs automatically logged out and I could not establish new HTTPS connections, so basically a system with a date that far in the future is not usable in general.
I am skeptical how much of a problem this is in reality. Who sets the system date to >= 2035?
It will happen automatically in <12 15 years. Systems with long time support created now will suffer of this issue eventually.
I mean, the issue to fix is not that an user will set the date to 2050 by accident or because a specific use case, it is that a system that will be build in the next 2 5 years and needs to be supported for around 10 year more will stop working because of this bug.
Edit: math is hard
On a second thought I think that dlopen()
can be a better solution here. I will try this path.
@giampaolo I updated the PR using dlopen(). Now there is no link to libsystemd, so a Linux system without the library will no produce the error of the missing symbol. Those are now dynamically loaded.
@giampaolo ping?
@giampaolo did you have a chance to evaluate the new approach here?
Meanwhile we added in openSUSE the patch in the package
Hi there.
@giampaolo I updated the PR using dlopen(). Now there is no link to libsystemd, so a Linux system without the library will no produce the error of the missing symbol. Those are now dynamically loaded.
I'm afraid this is the case only if you install psutil from sources. In the C code you do:
#ifdef SYSTEMD_LINUX
#include <dlfcn.h>
#endif
...but SYSTEMD_LINUX
is set only when setup.py is invoked (aka installation from sources). The problem we have is with the installation of wheels / binaries. If the wheel was compiled with systemd support, but the system which installs the wheel does not have systemd support, then the user will get "symbol not found". This is the annoying problem.
You should somehow avoid the #ifdef SYSTEMD_LINUX
check.
You should somehow avoid the
#ifdef SYSTEMD_LINUX
check.
Yes. Dropped.
...but SYSTEMD_LINUX is set only when setup.py is invoked (aka installation from sources). The problem we have is with the installation of wheels / binaries. If the wheel was compiled with systemd support, but the system which installs the wheel does not have systemd support, then the user will get "symbol not found". This is the annoying problem.
But now that I recall ... this part is not true.
If the library is not present in the system, the load_library
function will return NULL and the code will fallback to the current code that uses utmp.
The new macro dlsym_check
is checking if the symbol is present in the library, and if one of the required symbols is not available, again it will return NULL and use the same fallback.
The SYSTEMD_LINUX condition is used only to remove this part in case that we want a source code that never checks for systemd.
So, the confusion matrix is like this (C = Compile time; R = At runtime; S = Systemd is present; N = No systemd present)
You should somehow avoid the
#ifdef SYSTEMD_LINUX
check.Yes. Dropped.
I re-added it for now to have the same code-base for discussion. We can drop all the systemd checks and make the systemd code path always present (with the fallback to utmp) but the reason will not be for avoiding the "symbol not found" error, as it is already not present.
Me: You should somehow avoid the #ifdef SYSTEMD_LINUX check. You: Yes. Dropped. You: I re-added it for now to have the same code-base for discussion.
I'm confused now. :) Can you commit the "final" code?
I'm confused now. :) Can you commit the "final" code?
Also can you please avoid to squash commits and force push? It's more clear to see the individual commits and how they evolve.
Me: You should somehow avoid the #ifdef SYSTEMD_LINUX check. You: Yes. Dropped. You: I re-added it for now to have the same code-base for discussion.
I'm confused now. :) Can you commit the "final" code?
: ) lets use the current one that contain the SYSTEMD_LINUX macro condition. We can keep it, so the systemd code path can disappear, or remove it, and both code path will remain.
Also can you please avoid to squash commits and force push? It's more clear to see the individual commits and how they evolve.
Sure! I will keep the commits.
Also, one side question from my side: how the wheels with the binary content are created? Are build in some shared Pypi build system, or comes directly from the developer infrastructure?
lets use the current one that contain the SYSTEMD_LINUX macro condition. We can keep it, so the systemd code path can disappear, or remove it, and both code path will remain.
The goal here should be avoid the "symbol not found error". Can we do that without using the #ifdef SYSTEMD_LINUX
check? If yes, please remove it from the PR.
Also, one side question from my side: how the wheels with the binary content are created? Are build in some shared Pypi build system, or comes directly from the developer infrastructure?
They are created on each commit by Github Actions:
https://github.com/giampaolo/psutil/blob/8d0a0433493e14535a86bd86a54e177afa4ecacc/.github/workflows/build.yml#L43
...and saved as artifacts. You can then download them with make download-wheels-github
.
The goal here should be avoid the "symbol not found error". Can we do that without using the
#ifdef SYSTEMD_LINUX
check? If yes, please remove it from the PR.
I am a bit confused now : ). In the comment https://github.com/giampaolo/psutil/pull/2300#issuecomment-1772159232 I explained how this error is not there. Am I missing something?
If yes, please remove it from the PR.
The last commit is removing the SYSTEMD_LINUX macro, making the systemd and utmp path available in any binary.
In preparation for your PR I made some refactoring, and created a specific psutil/arch/linux/users.c
to host the new changes you are introducing here: https://github.com/giampaolo/psutil/pull/2320. Unfortunately you'll have to rebase / solve conflicts. I did it now instead of laster mostly to preserve (your) history once we merge.
Unfortunately you'll have to rebase / solve conflicts.
Thanks for the refactoring, this makes the code much more clear.
I did the rebase, but I merged my two commits in one, to simplify the rebase.
I added some initial comments but overall it looks like a pretty good start. In addition:
1. I prefer to have 2 separate functions, both of which can be called from Python (you'll have to update users.h and remove `static` from the function definitions). This way we can test that they return the same thing (in test_linux.py) and that they have no memory leaks (in test_memleaks.py).
Should be three then? The one used for Python users (psutil_users), the one for systemd (psutil_users_systemd) and the one for utmp (psutil_users_utmp)? The issues that I see are:
2. I think you should close the handle (`dlclose(handle)`) on exit.
Done.
3. perhaps it makes more sense to check for `sd_booted()` in the `load_systemd()` function (and close the handle in there if systemd didn't boot).
OK. Done.
4. please use 4 spaces indentation
Yes. Done.
Should be three then? The one used for Python users (psutil_users), the one for systemd (psutil_users_systemd) and the one for utmp (psutil_users_utmp)?
There should be 2: psutil_users_systemd
returning None
if unavailable, and psutil_users_utmp
. In Python you will call psutil_users_systemd
first, and fallback to psutil_users_utmp
if it returns None
.
the utmp and the systemd version are not returning the same value: there are different decisions done, for example in the tty field
What are the differences between the 2 versions? I assumed they were exactly the same.
There should be 2:
psutil_users_systemd
returningNone
if unavailable, andpsutil_users_utmp
. In Python you will callpsutil_users_systemd
first, and fallback topsutil_users_utmp
if it returnsNone
.
Done.
I do not like tho. IMHO this breaks the API (there is no users()
Python function anymore for Linux) and the alternative (using user() to point psutil_user_utmp) will silently keep the Y2038 problem to the users.
What are the differences between the 2 versions? I assumed they were exactly the same.
systemd unifies all this small differences in a more coherent way (application independent)
What I meant is adding this logic to
_pslinux.py
:
Outch. Of course. Done.
I now realize though that there is some logic in the python layer that is specific to the utmp implementation:
We can fix that later, in another PR
Mmm this may be a problem. Are these differences documented somewhere?
None that I am aware. In utmp some differences are per-application
I tried running your code but it fails because
'sd_session_get_leader'
is missing.
Added the requested logs. This function is new, from systemd v254: https://github.com/systemd/systemd/commit/403082602d4230c224529c46e2d8a392f3a50e49
Code did not compile so I fixed it via the web UI in https://github.com/giampaolo/psutil/pull/2300/commits/75570aba580786b43275374fddc8c3f43a2b0579 (you should update your branch).
In addition, a test is failing but I don't have write permissions to your branch. This is the way to fix it (in addition I also added memleak tests). Could you please apply this patch and push it?
+++ b/psutil/tests/test_linux.py
@@ -1513,25 +1513,27 @@ class TestMisc(PsutilTestCase):
psutil._pslinux.boot_time)
assert m.called
- def test_users_mocked(self):
+ def test_users_utmp_mocked(self):
# Make sure ':0' and ':0.0' (returned by C ext) are converted
# to 'localhost'.
- with mock.patch('psutil._pslinux.cext.users',
- return_value=[('giampaolo', 'pts/2', ':0',
- 1436573184.0, True, 2)]) as m:
- self.assertEqual(psutil.users()[0].host, 'localhost')
- assert m.called
- with mock.patch('psutil._pslinux.cext.users',
- return_value=[('giampaolo', 'pts/2', ':0.0',
- 1436573184.0, True, 2)]) as m:
- self.assertEqual(psutil.users()[0].host, 'localhost')
- assert m.called
- # ...otherwise it should be returned as-is
- with mock.patch('psutil._pslinux.cext.users',
- return_value=[('giampaolo', 'pts/2', 'foo',
- 1436573184.0, True, 2)]) as m:
- self.assertEqual(psutil.users()[0].host, 'foo')
- assert m.called
+ with mock.patch('psutil._pslinux.cext.users_systemd',
+ return_value=None):
+ with mock.patch('psutil._pslinux.cext.users_utmp',
+ return_value=[('giampaolo', 'pts/2', ':0',
+ 1436573184.0, True, 2)]) as m:
+ self.assertEqual(psutil.users()[0].host, 'localhost')
+ assert m.called
+ with mock.patch('psutil._pslinux.cext.users_utmp',
+ return_value=[('giampaolo', 'pts/2', ':0.0',
+ 1436573184.0, True, 2)]) as m:
+ self.assertEqual(psutil.users()[0].host, 'localhost')
+ assert m.called
+ # ...otherwise it should be returned as-is
+ with mock.patch('psutil._pslinux.cext.users_utmp',
+ return_value=[('giampaolo', 'pts/2', 'foo',
+ 1436573184.0, True, 2)]) as m:
+ self.assertEqual(psutil.users()[0].host, 'foo')
+ assert m.called
def test_procfs_path(self):
tdir = self.get_testfn()
+++ b/psutil/tests/test_memleaks.py
@@ -485,6 +485,14 @@ class TestModuleFunctionsLeaks(TestMemoryLeak):
name = next(psutil.win_service_iter()).name()
self.execute(lambda: cext.winservice_query_descr(name))
+ if LINUX:
+
+ def test_users_systemd(self):
+ self.execute(cext.users_systemd)
+
+ def test_users_utmp(self):
+ self.execute(cext.users_utmp)
+
if __name__ == '__main__':
from psutil.tests.runner import run_from_name
(you should update your branch)
Done, I also rebased on top of the aplanas-utmp branch, that contains your code
From Github CI https://github.com/giampaolo/psutil/actions/runs/6614641177/job/17968190988?pr=2300:
2023-10-23T15:42:36.3752124Z psutil-debug [psutil/arch/linux/users.c:44]> missing 'sd_session_get_leader' fun
2023-10-23T15:42:36.5224487Z psutil-debug [psutil/arch/linux/users.c:44]> missing 'sd_session_get_leader' fun
This means that the new code is currently not tested. I also cannot test it locally on Ubuntu 22.04 for the same reason (missing 'sd_session_get_leader'). What distro are you on? Can you run make test
and make test-memleaks
?
What distro are you on?
openSUSE Tumbleweed, but I guess Arch will also have the systemd v254 there.
Can you run make test
That fails a lot. Some of the fails are assumptions about partitions, mount points (I am using btrfs with subvolumes, etc). But from the one that is related with this PR I see one:
======================================================================
FAIL: psutil.tests.test_posix.TestSystemAPIs.test_users (psutil=[suser(name='root', terminal=None, host='192.168.122.1', started=1698081882.810051, pid=1209)], who='root pts/0 Oct 23 19:24 . 1227 (192.168.122.1)')
----------------------------------------------------------------------
Traceback (most recent call last):
File "/root/psutil/psutil/tests/test_posix.py", line 345, in test_users
self.assertEqual(u.terminal, terminals[idx])
AssertionError: None != 'pts/0'
and make test-memleaks?
....
Ran 101 tests in 1.062s
OK (skipped=9)
SUCCESS
I tried playing with it a bit more and other than sd_session_get_leader
I'm also missing sd_session_get_start_time
and sd_session_get_username
. I'm afraid it's too premature to add this right now. Most of these systemd APIs seem they were added very recently.
Most of these systemd APIs seem they were added very recently.
All those are already present in v254
Let's not rush this. Considering these new APIs are literally 5 months old, it seems unlikely to me that we may cause issues to long-term support distros if we don't do this right now.
We don't have to wait years, but the minimum requirement here should be that the default Github Linux image has systemd >= v254, so that both utmp and systemd code paths are covered by unit tests. We can leave this PR open 'till then.
In the meantime, another couple of suggestions I can give for this PR:
libsystemd.so.0
handle in a global var so that it's loaded only on first callI now realize though that there is some logic in the python layer that is specific to the utmp implementation: Perhaps it makes sense to translate this logic in C, and remove it from the Python layer, so that the 2 C implementations return the same things.
FYI I did this in 0c3a1c5a, so you need to update your branch again.
* cache the `libsystemd.so.0` handle in a global var so that it's loaded only on first call
This will leak the handle, as the dlclose() cannot be called. Would this be OK?
Mmm no I don't think so. It will slightly increase memory usage the first time it's called, then it will just stay loaded in memory until the python process is terminated. The important thing is not to open a new handle on each call.
The important thing is not to open a new handle on each call.
Done.
@giampaolo I found what I think is a bug: if there is an error in the _utmp or _systemd version it will return NULL without setting the exception. The last commit is addressing this
The test is also fixed: the issue was not code related.
For now I will partially update the patch living in the openSUSE package, and we will revisit this PR in some time in the future then.
Thanks a lot for the detailed review and all your patience working with me.
There's one last thing. From the man page https://manpages.debian.org/testing/libsystemd-dev/sd_session_get_username.3.en.html:
On success, sd_session_get_state(), sd_session_get_uid(), sd_session_get_seat(),
sd_session_get_service(), sd_session_get_type(), sd_session_get_class(),
sd_session_get_display(),sd_session_get_remote_user(), sd_session_get_remote_host()
and sd_session_get_tty() return 0 or a positive integer. On failure, these calls return a
negative errno-style error code.
Errors
Returned errors may indicate the following problems:
-ENXIO: The specified session does not exist.
-ENODATA: The given field is not specified for the described session.
-EINVAL: An input parameter was invalid (out of range, or NULL, where that is not accepted).
-ENOMEM: Memory allocation failed.
Right now on failure we get RuntimeError("cannot get user information via systemd");
without any indication of what syscall failed and why. Something like this should do it (not tested):
#include <stdlib.h> // abs()
void set_systemd_errno(const char *syscall, int neg_errno) {
PyObject *exc;
int pos_errno;
pos_errno = abs(neg_errno);
sprintf(
fullmsg, "%s (originated from %s)", strerror(pos_errno), syscall
);
exc = PyObject_CallFunction(PyExc_OSError, "(is)", pos_errno, fullmsg);
PyErr_SetObject(PyExc_OSError, exc);
Py_XDECREF(exc);
}
PyObject *
psutil_users_systemd(PyObject *self, PyObject *args) {
int ret;
...
ret = sd_session_get_username(session_id, &username);
if (ret < 0) {
set_systemd_errno("sd_session_get_username", ret);
goto error;
}
error:
...
return NULL; // NOT RuntimeError
There's one last thing.
Adapted the code and tested it a bit. Done.
Summary
Description
Bi-arch systems line x86-64 present the Y2038 problem, where an overflow can be produced because some glibc compatibility decissions (see https://github.com/thkukuk/utmpx/blob/main/Y2038.md for more information)
This patch uses logind from systemd instead of utmp on Linux systems, if the systemd version is support the new API (>= 254).