PFZheng / psutil

Automatically exported from code.google.com/p/psutil
Other
0 stars 0 forks source link

Sleep parameter for cpu_percent() calls #123

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Reference: Issue #120

"Since cpu_percent is often paired with a 'time.sleep' call, I wonder if it 
would be simpler to have cpu_percent accept a "sleep" argument with default 
behavior of, say, 2s."

Opening this enh request as a reminder to look into this and decide if we want 
to add such a parameter to the percentage calculation functions. They require 
at least .1-.2 seconds to calculate a percentage for the interval, so it 
usually makes sense to use a sleep() call between requests. This would obviate 
the need for a separate call to time.sleep(). We could also make the default 0, 
i.e. do not sleep, which would avoid any performance issues with using this in 
the default case.

Original issue reported on code.google.com by jlo...@gmail.com on 27 Oct 2010 at 3:43

GoogleCodeExporter commented 9 years ago

Original comment by jlo...@gmail.com on 27 Oct 2010 at 3:43

GoogleCodeExporter commented 9 years ago
I think it makes sense, but I'd be for post-poning the decision for version 
0.2.1.
Why do you think this would be needed also for memory stats, btw?

Original comment by g.rodola on 27 Oct 2010 at 7:54

GoogleCodeExporter commented 9 years ago
Good question... I guess I should probably not write bug reports/feature 
requests when I'm tired ;)

Original comment by jlo...@gmail.com on 27 Oct 2010 at 8:06

GoogleCodeExporter commented 9 years ago
As per http://code.google.com/p/psutil/source/detail?r=766 discussion me and 
Yan tried to refactor cpu_percent() code a bit, also including the interval 
parameter while we were at it.

def cpu_percent(interval=0.1):
    """Return a float representing the current system-wide CPU 
    utilization as a percentage.
    """
    t1 = cpu_times()
    t1_all = sum(t1)
    t1_busy = t1_all - t1.idle

    time.sleep(interval)

    t2 = cpu_times()
    t2_all = sum(t2)
    t2_busy = t2_all - t2.idle

    busy_delta = t2_busy - t1_busy
    difference = t2_all - t1_all
    try:
        busy_perc = (busy_delta / difference) * 100
        if busy_perc < 0.0:
           raise ZeroDivisionError             
    except ZeroDivisionError:
        return 0.0
    else:
        return round(busy_perc, 1)

Notable changes are:

- (obviously) "interval" parameter defaulting to 0.1

- no longer rely on time.time() and neither NUM_CPUS: considering that 
cpu_times() already returns times for *all* CPUs I fail to understand why we 
were using NUM_CPUS in the first place

- returned floats are rounded with precision == 1. I did this because numbers 
might be printed by using scientific notation resulting in stuff like 
"3.46474172093e-09" which is quite misleading

Original comment by g.rodola on 1 Nov 2010 at 7:59

GoogleCodeExporter commented 9 years ago
I think in general this idea makes sense, and it's nice to avoid needing 
time.time()... the only downside to doing it this way is this now becomes a 
blocking call waiting for "interval" which is why we went with wall clock time 
originally. 

> considering that cpu_times() already returns times for *all* CPUs I fail to 
understand why we were 
> using NUM_CPUS in the first place

If you're comparing against wall clock time, you have to know how many CPUs are 
being counted, or you can't make the calculation to determint CPU usage.

Do we need to use the ZeroDivisionError try/catch? I see why they're in there, 
but maybe for maintainability later it would be better if we used explicit 
checks for error conditions that would result in a ZeroDivisionError. That way 
it's clearer and easier to debug if we run into further CPU util related bugs. 
Otherwise I'm ok with this change, I think it's likely to end up a little more 
accurate/reliable.

Original comment by jlo...@gmail.com on 1 Nov 2010 at 8:24

GoogleCodeExporter commented 9 years ago
I think it's better to keep the ZeroDivisionError catch, although I find it's 
kinda ugly as well. On Linux busy_delta = t2_busy - t1_busy might happen to be 
a negative number and I don't why that happens.

Original comment by g.rodola on 1 Nov 2010 at 8:50

GoogleCodeExporter commented 9 years ago
> On Linux busy_delta = t2_busy - t1_busy might happen to be a negative number 
> and I don't why that happens.

Probably the same reason the code I wrote was coming up with negative values 
for the delta on Linux :D

Original comment by jlo...@gmail.com on 1 Nov 2010 at 8:58

GoogleCodeExporter commented 9 years ago
Before we decide to commit this, I would like to confirm that it's more 
accurate than (or at least equal to) the existing code we had. Can you compare 
the two? I am a little concerned that we'll end up with a lot of 0% utilization 
numbers when we shouldn't be (same issue I noted earlier for my code) 

Original comment by jlo...@gmail.com on 1 Nov 2010 at 8:59

GoogleCodeExporter commented 9 years ago
Not sure how more accurate this is compared to previous version. 
I think it *should* be more accurate, because we dropped time.time(), I also 
compared it with htop and looks fine but I don't know other ways to test it 
better than this. 
Any hints?

> I am a little concerned that we'll end up with a lot of 0% utilization 
numbers when 
> we shouldn't be

I have the same concern as well, and most of them are caused when we catch 
ZeroDivisionError.
I also though whether it would be better to raise ValueError("interval too 
low") instead of falling back on 0.0, but I noticed that ZeroDivisionError 
occurs also when using 0.1 interval, after some cycles.

Original comment by g.rodola on 1 Nov 2010 at 9:22

GoogleCodeExporter commented 9 years ago
It it seems to match htop on Linux and BSD, and taskmgr on Windows, then I'd 
say that's a good indicator (I can test OS X once it's committed). Mainly I 
don't want to be reporting wildly different values than the system task manager 
or top/ps, since that's been our standard for other functions.

As far as returning 0.0 values unnecessarily, I'm not sure what we can do about 
it. It seems like there's just not enough accuracy to ensure that we don't end 
up with negative deltas, especially with smaller intervals. Does this only 
happen on Linux? I didn't see this type of behavior on OS X, maybe it's 
specific to Linux anyway.

My vote would still be an explicit check to see if "difference" or "busy_delta" 
are <= 0, and if so we can return 0% util, rather than the try/catch. It's not 
ideal, and I'd rather address why we're seeing negative deltas in the first 
place, but I don't know what to do about that. Maybe if/when we implement 
per-CPU stats we'll have better luck there.

Original comment by jlo...@gmail.com on 1 Nov 2010 at 9:36

GoogleCodeExporter commented 9 years ago
Code committed in r768, including removal of ZeroDivisionError try/except 
statement.
Still have to modify Process.get_cpu_percent() in the same manner, though.

> Does this only happen on Linux? I didn't see this type of behavior on OS X, 
maybe 
> it's specific to Linux anyway.

I'm going to see whether this is Linux only.
I have a doubt though. Right now cpu_times() return times for:

- user
- system
- nice
- idle
- irq
- softirq
- iowait

...and we assume that t1_all = sum(cpu_times()).
Are we sure about this?
Here:
http://ubuntuforums.org/showthread.php?t=148781
...for example, someone states that for calculating the percentage only user, 
nice, system, and idle times are required and the other values should be 
ignored.

Just 2 quick cents btw... I'll investigate further later today.

Original comment by g.rodola on 2 Nov 2010 at 3:31

GoogleCodeExporter commented 9 years ago
> ...and we assume that t1_all = sum(cpu_times()).
> Are we sure about this?

The original code sample you pointed to here: 

    http://www.boduch.ca/2009/02/python-cpu-usage.html

Uses only user, nice, system, idle times. I think we should be doing the same, 
at least on Linux.

Original comment by jlo...@gmail.com on 2 Nov 2010 at 4:07

GoogleCodeExporter commented 9 years ago
Tested on both Windows and FreeBSD (where we are also using irq time to fill 
t_all variables) and everything is fine: busy_delta *never* happens to be 
negative hence it seems the problem only affects Linux.

On Linux I tried to calculate t_all as such (user + system + nice + idle) but 
didn't help either.
For some reason when the result is negative is because t2_idle happens to be 
greater than t1_idle.

Original comment by g.rodola on 2 Nov 2010 at 7:25

GoogleCodeExporter commented 9 years ago
> For some reason when the result is negative is because t2_idle happens to be 
greater than t1_idle.

Do you mean t2_idle is less than t1_idle?

Original comment by jlo...@gmail.com on 2 Nov 2010 at 7:41

GoogleCodeExporter commented 9 years ago
No, the opposite:

# pdb session started when busy_delta happens to be negative
(Pdb) t1.idle
9831.7000000000007
(Pdb) t2.idle
9831.7199999999993

Original comment by g.rodola on 2 Nov 2010 at 7:48

GoogleCodeExporter commented 9 years ago
I don't understand, why would that be a problem? Idle time should increase over 
time, so t2.idle is supposed to be >= t1.idle, depending on CPU utilization 
during that time period.

Original comment by jlo...@gmail.com on 2 Nov 2010 at 7:52

GoogleCodeExporter commented 9 years ago
You are right, sorry. Then problem is how we calculate busy time maybe?

(Pdb) t1_busy
1556.1700000000019
(Pdb) t2_busy
1556.1700000000001

What I do not understand is how t2_busy can be < t1_busy.
Or maybe I stopped to understand everything a long time ago. =)

Original comment by g.rodola on 2 Nov 2010 at 8:01

GoogleCodeExporter commented 9 years ago
> (Pdb) t1_busy
> 1556.1700000000019
> (Pdb) t2_busy
> 1556.1700000000001

Hmm... that looks suspiciously like a precision issue. The values are probably 
effectively the same, but we could be losing some precision in the conversion 
to a float or in the arithmetic we're using to get the CPU times. Trivial 
example:

>>> 1 + 1.1
2.1000000000000001

Original comment by jlo...@gmail.com on 2 Nov 2010 at 8:24

GoogleCodeExporter commented 9 years ago
I think you're right.
This seems to fix the issue:

--- psutil/__init__.py  (revisione 768)
+++ psutil/__init__.py  (copia locale)
@@ -441,8 +441,16 @@
     t2 = cpu_times()
     t2_all = sum(t2)
     t2_busy = t2_all - t2.idle
+    
+    # This should avoid precision issues (t2_busy < t1_busy1).
+    # This is based on the assumption that anything beyond 2nd decimal
+    # digit is garbage.
+    # XXX - maybe it makes sense to do this for all numbers returned
+    # in cpu_times()
+    t1_busy = round(t1_busy, 2)
+    t2_busy = round(t2_busy, 2)

-    if t2_busy <= t1_busy:
+    if t2_busy == t1_busy:
         return 0.0

     busy_delta = t2_busy - t1_busy

Original comment by g.rodola on 2 Nov 2010 at 9:01

GoogleCodeExporter commented 9 years ago
Excellent... do you want to make any further changes, or do you think we're ok 
to commit this officially then? 

Original comment by jlo...@gmail.com on 2 Nov 2010 at 9:44

GoogleCodeExporter commented 9 years ago
I think we're ok with psutil.cpu_percent() but Process.get_cpu_percent() is not 
ok, now that I've just looked at it.
Aside from a similar code refactoring that still needs to be done, it seems it 
can return values > 100.0.
The code below shows the problem:

import psutil, os, time
p = psutil.Process(4103)
while 1:
    print p.get_cpu_percent()
    time.sleep(.001)

Also, I'm not sure whether it's a good idea to add an "interval" parameter as 
we did for system cpu percentage.
If you imagine an application like taskmgr.exe, a GUI which [lists/deals with] 
all processes, a blocking cpu_percent() call is useless.
For that kind of application it is better a get_cpu_percent() function which 
calculates elapsed time since last call, as the one we have right now.

Maybe we can refactor is at follows: 
- adds interval parameter defaulting to 0.1.
- if interval > 0 work as system cpu_times()
- if interval == 0 return immediately and calculate the result based on time 
elapsed since last call

What do you think?

Original comment by g.rodola on 2 Nov 2010 at 10:02

GoogleCodeExporter commented 9 years ago
> I think we're ok with psutil.cpu_percent() but Process.get_cpu_percent() is 
not ok, now 
> that I've just looked at it. Aside from a similar code refactoring that still 
needs to be 
> done, it seems it can return values > 100.0.

I am not able to reproduce this... maybe this is another Linux-only issue? If 
so perhaps refactoring the code the same way will resolve it as well.

> - adds interval parameter defaulting to 0.1.
> - if interval > 0 work as system cpu_times()
> - if interval == 0 return immediately and calculate the result based on time 
elapsed since last call

That's fine for me, though if you think the main use case is to prefer a 
non-blocking call, maybe we should default to 0 for interval? Either way works 
for me. 

Original comment by jlo...@gmail.com on 2 Nov 2010 at 10:31

GoogleCodeExporter commented 9 years ago
Code committed as r773.
I decided to set 0.1 as interval default (hence, it's blocking).
I also removed cpu-percent related code from the constructor because it slows 
down quite a bit (25%), at least on Linux.
I still like an opinion about this though, hence I'm setting WaitingForReview 
state for now as long as Jay gets back.

Original comment by g.rodola on 3 Nov 2010 at 6:11

GoogleCodeExporter commented 9 years ago
does the code work with 0 as the interval, by calculating the interval since 
the last time the function was called? If so then I'm fine with the interval 
being .1 by default. I think it would be nice to have a non-blocking version of 
the call, but other than that I'm fine with either 0 or .1 as default.

Original comment by jlo...@gmail.com on 6 Nov 2010 at 2:43

GoogleCodeExporter commented 9 years ago
You get the non-blocking version by passing 0.0 or None as the argument.

Original comment by g.rodola on 6 Nov 2010 at 11:44

GoogleCodeExporter commented 9 years ago
Modified psutil.cpu_percent() so that it can work in both blocking and 
non-blocking mode in the same manner in r774.
Closing out as fixed.

Original comment by g.rodola on 8 Nov 2010 at 4:52

GoogleCodeExporter commented 9 years ago
Issue 131 has been merged into this issue.

Original comment by g.rodola on 8 Nov 2010 at 8:57

GoogleCodeExporter commented 9 years ago

Original comment by g.rodola on 13 Nov 2010 at 3:14

GoogleCodeExporter commented 9 years ago

Original comment by g.rodola on 9 Jun 2011 at 10:33

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Updated csets after the SVN -> Mercurial migration:
r768 == revision 2171f705f0f4
r773 == revision 6366524f642b
r774 == revision 589ce4a800a3

Original comment by g.rodola on 2 Mar 2013 at 11:55