giampaolo / psutil

Cross-platform lib for process and system monitoring in Python
BSD 3-Clause "New" or "Revised" License
10.22k stars 1.38k forks source link

process_iter(): no longer check whether PIDs have been reused #2396

Closed giampaolo closed 3 months ago

giampaolo commented 5 months ago

Summary

Description

For every process yielded by psutil.process_iter(), internally we check whether the process PID has been reused, in which case we return a "fresh" Process instance. In order to check for PID reuse we are forced to create a new Process instance, retrieve process create_time() and compare it with the original process. Performance wise, it turns out this has a huge (and exponential) cost. This is particularly relevant because process_iter() is typically used to write task manager like apps, where the full process list is retrieved every second. I realized this at work, while writing a process monitor agent that runs on small hardware (a cleaning robot).

By removing the PID reuse check I get a a 21x speedup on a Linux OS with 481 running PIDs:

import time, psutil
started = time.monotonic()
for x in range(1000):
    list(psutil.process_iter())
print(f"completed in {(time.monotonic() - started):.4f} secs")

Current master: Number of pids: 481. Completed in 5.1079 secs

With PID reuse check removed: Number of pids: 481. Completed in 0.2419 secs

Repercussions

giampaolo commented 3 months ago

Fixed in 7556e5d4b and 89b6096f2.