giampaolo / psutil

Cross-platform lib for process and system monitoring in Python
BSD 3-Clause "New" or "Revised" License
10.23k stars 1.38k forks source link

[SmartOS LX Zone] List Index out of range (/proc/[pid]/stat) #2455

Closed BobTB closed 1 week ago

BobTB commented 1 week ago

Summary

Description

I’ve been using psutil in a SmartOS LX zone environment, and I’ve encountered an issue where the /proc/[pid]/stat file provides fewer fields than expected (52 fields on a standard Linux system). Specifically, psutil raises an IndexError: list index out of range when attempting to access certain fields like blkio_ticks (field 40), which aren’t present in the LX zone's /proc implementation.

Would it be possible to implement more graceful handling for these cases? For example, if a field is missing, psutil could return 0 instead of raising an exception. This way, applications that depend on psutil can continue to function, even in environments where the /proc filesystem may not provide all the expected fields.

I believe this would help improve compatibility across different platforms, especially in environments like SmartOS LX zones that partially emulate Linux.

giampaolo commented 1 week ago

Definitively a bug. Can you paste the traceback?

BobTB commented 1 week ago

Yes, here it is, A fix for this would be a marvelous thing which will allow for running for docker containers (example frigate) which now fail.

python3 test.py
Enter the process name to search for: s6
Enter the directory path to check: /tmp/cache
Traceback (most recent call last):
  File "/opt/frigate/test.py", line 38, in <module>
    check_cache_files_for_process(process_name, cache_dir)
  File "/opt/frigate/test.py", line 11, in check_cache_files_for_process
    for process in psutil.process_iter(['pid', 'name']):
  File "/usr/local/lib/python3.9/dist-packages/psutil/__init__.py", line 1505, in process_iter
    yield add(pid)
  File "/usr/local/lib/python3.9/dist-packages/psutil/__init__.py", line 1484, in add
    proc = Process(pid)
  File "/usr/local/lib/python3.9/dist-packages/psutil/__init__.py", line 323, in __init__
    self._init(pid)
  File "/usr/local/lib/python3.9/dist-packages/psutil/__init__.py", line 359, in _init
    self.create_time()
  File "/usr/local/lib/python3.9/dist-packages/psutil/__init__.py", line 752, in create_time
    self._create_time = self._proc.create_time()
  File "/usr/local/lib/python3.9/dist-packages/psutil/_pslinux.py", line 1714, in wrapper
    return fun(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/psutil/_pslinux.py", line 1945, in create_time
    ctime = float(self._parse_stat_file()['create_time'])
  File "/usr/local/lib/python3.9/dist-packages/psutil/_pslinux.py", line 1714, in wrapper
    return fun(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/psutil/_common.py", line 497, in wrapper
    raise raise_from(err, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.9/dist-packages/psutil/_common.py", line 495, in wrapper
    return fun(self)
  File "/usr/local/lib/python3.9/dist-packages/psutil/_pslinux.py", line 1796, in _parse_stat_file
    ret['blkio_ticks'] = fields[39]  # aka 'delayacct_blkio_ticks'
IndexError: list index out of range
giampaolo commented 1 week ago

Fixed in https://github.com/giampaolo/psutil/commit/70b6787e432417c114daf50e3026e532da411fc6. Please reopen if it's not fixed.

giampaolo commented 1 week ago

While you're at it, since it seems you have an "exotic" system (it's weird this is discovered so far in psutil lifetime): could you run the full test suite? Maybe there are other corner cases / bugs that only you can experience on your particular setup.

You can run tests with make test if you have the cloned code, or python3 -m psutil.tests straight from psutil installation.

BobTB commented 1 week ago

Wow, this was a quick fix for the blkio_ticks. Thank you.

I managed to get the tests to run, attaching the output. testoutput.txt

giampaolo commented 6 days ago

Mmm that's a lot of failures. From the errors I can see that SmartOS (Illumos?) does not implement many things that usually are taken for granted on a Linux system (process CPU affinity and IO affinity, EINVAL for querying a zombie process, process rlimit() behaving differently, etc.).

I would say that some of these look like bugs which should be fixed in SmartOS / Illumos rather than in psutil. Fixing all of these would require adding specific cases in the code, as in if ILLUMOS: do_this() else: do_that(), both in psutil core and its test suite.

I know nothing about SmartOS / Illumos. How much used is it (aka: is it worth investing time into it?)? What's the use case for SmartOS / Illumos (e.g. why one would want to use that rather than Linux)?

BobTB commented 6 days ago

Illumos is used as a base for OmniOS, SmartOS and OpenIdiana etc....

SmartOS is a powerful, niche operating system based on the Illumos kernel (a derivative of OpenSolaris/Oracle Solaris), now maintained by MNX. It’s designed primarily for cloud infrastructure, high-availability server environments, and advanced storage solutions. Another key player in the Illumos ecosystem is OmniOS, which is known for providing enterprise-grade server and storage solutions, and Triton Datacenter by MNX.

Key Features: ZFS: Native ZFS support for robust, scalable storage with advanced features like snapshots, deduplication, and self-healing. SMB: Native windows share (no samba needed) including previous versions, and user permissions from windows domains. Zones: Lightweight, container-like virtualization that offers better security and isolation compared to Docker. DTrace: A powerful real-time system observability and debugging tool for deep performance analysis. Hybrid Cloud: SmartOS supports both KVM / Bhyve for full virtualization and LX Zones for running Linux binaries, making it ideal for cloud-native applications including running Docker containers on bare metal (no VM needed) - using Triton

Use Cases: Cloud Infrastructure: Used by cloud providers like Joyent (now MNX’s Triton), offering multi-tenant environments with lightweight containers and virtual machines, Nexenta for enterprise-grade storage solutions Enterprise Storage and Servers: OmniOS and SmartOS are trusted by companies requiring reliable ZFS storage solutions and also Nexenta enterprise-grade server platforms. High-Availability Systems: Known for stability, reliability, and data integrity in mission-critical systems. SmartOS and Illumos have a small, highly technical user base compared to more mainstream systems like Linux.

BobTB commented 6 days ago

So I think if there are some major breaking incompatibilities, like this one was, it is maybe worth to fix them. I will also go to Illumos and report these bugs/ommisions they have different behavior that expected in the LX Zone. I did not happen to hit on any of them up until now when I got a docker container (Frigate) which is using pstuils to do it.

giampaolo commented 6 days ago

Quick analysis of testoutput.txt. It looks like SmartOS emulates Linux pretty decently except for the following. I would say 4, 10, 11 are bugs which should be fixed in SmartOS. For some of the rest psutil may add some logic like "if field is not present return 0". But overall I would say psutil support for this platform doesn't look bad.

1) Process.context_switches() does not work because 'voluntary_ctxt_switches' and 'nonvoluntary_ctxt_switches' lines are not found in /proc/pid/status.

2) All connections test (psutil.net_connections()) fail, which basically means /proc/self/fd/ is not supported.

3) Process.ionice() (set) / ioprio_set syscall returns EINVAL / Invalid argument after process is terminated. It should return ESRCH instead.

4) Reading /proc/pid/stat for a zombie process results in EINVAL (I would expect ESRCH or something instead).

5) It looks like /proc/stat has a different format than on Linux.

6) Process memory's USS and Process.memory_maps() don't work because /proc/pid/smaps is not available.

7) Process.nice() (set) does not work because Cpus_allowed_list is not present in /proc/pid/stat

8) psutil.disk_partitions() does not work because /sys/dev/block/{MAJOR}:{MINOR}/uevent does not exist.

9) lscpu external CLI command fails because /sys/devices/system/cpu/possible does not exist.

10) If Process.rlimit() is called against a terminated process, it returns negative value instead of failing with NoSuchProcess.

11) Same for Process.ionice().