process_iter()'s filter argument

giampaolo commented 5 years ago

Example:

>>> psutil.process_iter(filter=lambda p: p.name() == 'python')

This will make it easier to write more compact code and one-liners (e.g. see doc). Also it will help writing more efficient code. E.g. if one is interested in processes with a certain name() and username() the concatenation of "ANDs" will avoid calling username() if the name() condition is not satisfied:

>>> psutil.process_iter(filter=lambda p: p.name() == 'python' and p.username() == 'jeff')

Right now this is not possible because the API only allows you to collect all process info in one shot and only after that we can apply the filtering logic:

for p in psutil.process_iter(attrs=['name', 'username']);
    if p.info['name'] == 'python' and p.info['username'] == 'jeff':
        ...

This will be particularly useful for resource intensive methods such as open_files(). The use case was suggested by @btimby.

giampaolo commented 5 years ago

Note to self - while I was thinking about real-world use cases I sort of enjoyed coming up with some "utils-module-like" examples :

import psutil, os

def filter_by_name(proc):
    return proc.name() == 'python' or 'python' in proc.cmdline()

def filter_by_current_user(proc):
    return proc.username() == os.getusername()

def filter_by_servers(proc):
    for conn in proc.connections():
        if conn.status == psutil.CONN_LISTEN:
            return True

def filter_by_logfiles(proc):
    for file in proc.open_files():
        if file.path.endswith('.log'):
            return True

for p in psutil.process_iter(filter=filter_by_logfiles):
    print(p)

That looks nice on the surface, but it sort of encourages using a paradigm that poses thorny questions regarding caching. The filtering function calls methods which are likely gonna be called in the "for" block as well (and these are not cached). An easy way to solve that would be passing the result of Process.as_dict() directly instead, which seems reasonable. Needs more thinking though, as I'm not sure such kind of paradigm should be encouraged (logic split vs. logic in one place, etc). In the meantime, here's the patch:

--- a/psutil/__init__.py
+++ b/psutil/__init__.py
@@ -1494,7 +1494,7 @@ def pid_exists(pid):
 _pmap = {}

-def process_iter(attrs=None, ad_value=None):
+def process_iter(attrs=None, ad_value=None, filter=None):
     """Return a generator yielding a Process instance for all
     running processes.

@@ -1517,10 +1517,17 @@ def process_iter(attrs=None, ad_value=None):
     """
     def add(pid):
         proc = Process(pid)
-        if attrs is not None:
-            proc.info = proc.as_dict(attrs=attrs, ad_value=ad_value)
-        _pmap[proc.pid] = proc
-        return proc
+        with proc.oneshot():
+            if filter is not None:
+                try:
+                    if not filter(proc):
+                        raise NoSuchProcess(proc.pid)
+                except AccessDenied:
+                    raise NoSuchProcess(proc.pid)
+            if attrs is not None:
+                proc.info = proc.as_dict(attrs=attrs, ad_value=ad_value)
+            _pmap[proc.pid] = proc
+            return proc

     def remove(pid):
         _pmap.pop(pid, None)

dpinol commented 2 months ago

any feedback on this?

giampaolo / psutil

process_iter()'s filter argument #1401