giampaolo / psutil

Cross-platform lib for process and system monitoring in Python
BSD 3-Clause "New" or "Revised" License
10.31k stars 1.39k forks source link

[Linux] psutil.tests.test_posix.TestProcess.test_nice fails under non-realtime scheduling policy #2378

Open matoro opened 8 months ago

matoro commented 8 months ago

Summary

Description

The following test fails when running under a non-realtime scheduling policy on Linux:

======================================================================
FAIL: psutil.tests.test_posix.TestProcess.test_nice
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/tmp/portage/dev-python/psutil-5.9.8/work/psutil-5.9.8/psutil/tests/test_posix.py", line 301, in test_nice
    self.assertEqual(ps_nice, psutil_nice)
AssertionError: '-' != 0

----------------------------------------------------------------------
Ran 569 tests in 3.163s

FAILED (failures=1, skipped=205)
FAILED

According to sched(7):

When scheduling non-real-time processes (i.e., those scheduled under the SCHED_OTHER, SCHED_BATCH, and SCHED_IDLE policies), the CFS scheduler employs a technique known as "group scheduling", if the kernel was configured with the CONFIG_FAIR_GROUP_SCHED option (which is typical).

...

Under group scheduling, a thread’s nice value has an effect for scheduling decisions only relative to other threads in the same task group. This has some surprising consequences in terms of the traditional semantics of the nice value on UNIX systems. In particular, if autogrouping is enabled (which is the default in various distributions), then employing setpriority(2) or nice(1) on a process has an effect only for scheduling relative to other processes executed in the same session (typically: the same terminal window). Conversely, for two processes that are (for example) the sole CPU-bound processes in different sessions (e.g., different terminal windows, each of whose jobs are tied to different autogroups), modifying the nice value of the process in one of the sessions has no effect in terms of the scheduler’s decisions relative to the process in the other session.

It would seem that ps(1) interprets this as the nice value being wholly meaningless under such scheduling policies, and simply outputs -, despite the fact that it can still be set and retrieved. Here's the source: https://gitlab.com/procps-ng/procps/-/blob/c415fc86452c933716053a50ab1777a343190dcc/src/ps/output.c#L703-711

Minimal reproducer:

$ ps -o nice -p $$
 NI
  0
$ python3 -c 'import psutil; print(psutil.Process().nice())'
0
$ chrt --idle 0 bash
$ ps -o nice -p $$
 NI
  -
$ python3 -c 'import psutil; print(psutil.Process().nice())'
0
giampaolo commented 8 months ago

Does this mean Process.nice() is wrong or the test is wrong? Should Process.nice() behave differently and take into account group scheduling (how?)?

matoro commented 8 months ago

Does this mean Process.nice() is wrong or the test is wrong? Should Process.nice() behave differently and take into account group scheduling (how?)?

To be honest I'm not sure what the correct solution is, that's why I opened this as an issue rather than a PR. I think the only real issue here is with SCHED_IDLE. From sched(7):

(Since Linux 2.6.23.) SCHED_IDLE can be used only at static priority 0; the process nice value has no influence for this policy. This policy is intended for running jobs at extremely low priority (lower even than a +19 nice value with the SCHED_OTHER or SCHED_BATCH policies).

Perhaps we should mirror ps(1) logic and return None or "-" for any policy other than SCHED_OTHER or SCHED_BATCH?