Process.is_running() speedup

GoogleCodeExporter commented 9 years ago

The current is_running() implementation relies on __eq__ which performs a
comparison against all the properties of two Process object instances in
pure Python.

By doing some benchmarks it is easily noticeable how a call to this method
takes a lot more time if compared to other Process methods/properties calls.

It would be worthy trying to find alternative approaches to speedup the
current implementation.

Original issue reported on code.google.com by billiej...@gmail.com on 14 Jul 2009 at 3:59

GoogleCodeExporter commented 9 years ago

Did you have anything in mind on how to speed this up? Have you profiled this 
at all
to see what takes the most time? My guess (but I could be wrong) is that it's 
the
creation of the new Process object in is_running() and not the code in __eq__ 
that
takes the most time. The __eq__() code is just using built-in functions to read 
some
attributes and compare them, so it shouldn't be all that slow. 

If it turns out that __eq__() is the culprit after all then one thing that 
comes to
mind is selecting a specific subset of items to search for equality instead of
checking all of them. For example, just look at PID, ppid, name, command line, 
path.
That would cut down the number of items that are being checked and also 
eliminate
several function calls in the body of __eq__ that are currently being used, 
including
the string operation for startswith() and the check for callable() etc.

Original comment by jlo...@gmail.com on 14 Jul 2009 at 4:22

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

One problem is that we are comparing against too many properties: currently
*everything* the Process class has to offer except callables and private 
methods.

It is true that __eq__ uses fast builtin functions for comparison, but every 
time it
asks for a property, that's time spent on calling the underlying C code and we 
should
avoid that whenever possible.

What I had in mind was to determine a reliable and *limited* subset of 
properties to
use as a "signature" to identify a Process object uniquely.

Given the fact that it's unlikely that the kernel will reuse the same PID for a 
short
amount of time, mixing (pid + process creation time) already gives us a discrete
amount of uniqueness:

def __eq__(self, other):
    h1 = (self.pid, self.create_time)
    h2 = (other.pid, other.create_time)
    return h1 == h2

Since we're not sure about the kernel behavior across platforms when it comes to
assign new PIDs we could need to add more values to enforce such uniqueness by
picking some other properties but I'm not sure which ones exactly.

I'd be for using cmdline but the underlying C call determining it also 
determines
ppid, name and path in one shot, hence it couldn't be the best choice.

Thoughts?

Original comment by billiej...@gmail.com on 14 Jul 2009 at 5:45

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

I think PID + create time is good enough, since a process can't have both a 
reused
PID and the same create time in any normal circumstance I can come up with. That
should speed things up a bunch.

I'm not sure why is_running got coded this way: 

    def is_running(self):
        """Return whether the current process is running in the current process
        list."""
        try:
            new_proc = Process(self.pid)
            # calls get_process_info() which may in turn trigger NSP exception
            str(new_proc)
        except NoSuchProcess:
            return False
        return self == new_proc

That's going to be much slower because the call to str() is forcing the new_proc
Process object to fill out all the attributes by calling the C code before we 
check
for equality. Whatever the reason, if we change that around like the below it 
should
work fine and be much faster after the changes are made to __eq__()

    def is_running(self):
        """Return whether the current process is running in the current process
        list."""
        try:
            new_proc = Process(self.pid)
            return self == new_proc
        except NoSuchProcess:
            return False

Original comment by jlo...@gmail.com on 14 Jul 2009 at 9:02

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Committed as r416.

Before the patch:
$ python -m timeit -s "import os, psutil; p = psutil.Process(os.getpid())"
"p.is_running()"
1000 loops, best of 3: 1.29 msec per loop

After the patch:
$ python -m timeit -s "import os, psutil; p = psutil.Process(os.getpid())"
"p.is_running()"
10000 loops, best of 3: 135 usec per loop

That's about 10 times faster.

Original comment by billiej...@gmail.com on 15 Jul 2009 at 9:35

Changed state: Fixed
Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Original comment by billiej...@gmail.com on 3 Sep 2009 at 7:48

Changed state: FixedInSVN
Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Original comment by billiej...@gmail.com on 17 Sep 2009 at 8:57

Changed state: Fixed
Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Updated csets after the SVN -> Mercurial migration:
r416 == revision 498c34a2245c

Original comment by g.rodola on 2 Mar 2013 at 11:50

Added labels: ****
Removed labels: ****

end18 / psutil

Process.is_running() speedup #59