Closed GoogleCodeExporter closed 9 years ago
Original comment by jlo...@gmail.com
on 27 Oct 2010 at 3:43
I think it makes sense, but I'd be for post-poning the decision for version
0.2.1.
Why do you think this would be needed also for memory stats, btw?
Original comment by g.rodola
on 27 Oct 2010 at 7:54
Good question... I guess I should probably not write bug reports/feature
requests when I'm tired ;)
Original comment by jlo...@gmail.com
on 27 Oct 2010 at 8:06
As per http://code.google.com/p/psutil/source/detail?r=766 discussion me and
Yan tried to refactor cpu_percent() code a bit, also including the interval
parameter while we were at it.
def cpu_percent(interval=0.1):
"""Return a float representing the current system-wide CPU
utilization as a percentage.
"""
t1 = cpu_times()
t1_all = sum(t1)
t1_busy = t1_all - t1.idle
time.sleep(interval)
t2 = cpu_times()
t2_all = sum(t2)
t2_busy = t2_all - t2.idle
busy_delta = t2_busy - t1_busy
difference = t2_all - t1_all
try:
busy_perc = (busy_delta / difference) * 100
if busy_perc < 0.0:
raise ZeroDivisionError
except ZeroDivisionError:
return 0.0
else:
return round(busy_perc, 1)
Notable changes are:
- (obviously) "interval" parameter defaulting to 0.1
- no longer rely on time.time() and neither NUM_CPUS: considering that
cpu_times() already returns times for *all* CPUs I fail to understand why we
were using NUM_CPUS in the first place
- returned floats are rounded with precision == 1. I did this because numbers
might be printed by using scientific notation resulting in stuff like
"3.46474172093e-09" which is quite misleading
Original comment by g.rodola
on 1 Nov 2010 at 7:59
I think in general this idea makes sense, and it's nice to avoid needing
time.time()... the only downside to doing it this way is this now becomes a
blocking call waiting for "interval" which is why we went with wall clock time
originally.
> considering that cpu_times() already returns times for *all* CPUs I fail to
understand why we were
> using NUM_CPUS in the first place
If you're comparing against wall clock time, you have to know how many CPUs are
being counted, or you can't make the calculation to determint CPU usage.
Do we need to use the ZeroDivisionError try/catch? I see why they're in there,
but maybe for maintainability later it would be better if we used explicit
checks for error conditions that would result in a ZeroDivisionError. That way
it's clearer and easier to debug if we run into further CPU util related bugs.
Otherwise I'm ok with this change, I think it's likely to end up a little more
accurate/reliable.
Original comment by jlo...@gmail.com
on 1 Nov 2010 at 8:24
I think it's better to keep the ZeroDivisionError catch, although I find it's
kinda ugly as well. On Linux busy_delta = t2_busy - t1_busy might happen to be
a negative number and I don't why that happens.
Original comment by g.rodola
on 1 Nov 2010 at 8:50
> On Linux busy_delta = t2_busy - t1_busy might happen to be a negative number
> and I don't why that happens.
Probably the same reason the code I wrote was coming up with negative values
for the delta on Linux :D
Original comment by jlo...@gmail.com
on 1 Nov 2010 at 8:58
Before we decide to commit this, I would like to confirm that it's more
accurate than (or at least equal to) the existing code we had. Can you compare
the two? I am a little concerned that we'll end up with a lot of 0% utilization
numbers when we shouldn't be (same issue I noted earlier for my code)
Original comment by jlo...@gmail.com
on 1 Nov 2010 at 8:59
Not sure how more accurate this is compared to previous version.
I think it *should* be more accurate, because we dropped time.time(), I also
compared it with htop and looks fine but I don't know other ways to test it
better than this.
Any hints?
> I am a little concerned that we'll end up with a lot of 0% utilization
numbers when
> we shouldn't be
I have the same concern as well, and most of them are caused when we catch
ZeroDivisionError.
I also though whether it would be better to raise ValueError("interval too
low") instead of falling back on 0.0, but I noticed that ZeroDivisionError
occurs also when using 0.1 interval, after some cycles.
Original comment by g.rodola
on 1 Nov 2010 at 9:22
It it seems to match htop on Linux and BSD, and taskmgr on Windows, then I'd
say that's a good indicator (I can test OS X once it's committed). Mainly I
don't want to be reporting wildly different values than the system task manager
or top/ps, since that's been our standard for other functions.
As far as returning 0.0 values unnecessarily, I'm not sure what we can do about
it. It seems like there's just not enough accuracy to ensure that we don't end
up with negative deltas, especially with smaller intervals. Does this only
happen on Linux? I didn't see this type of behavior on OS X, maybe it's
specific to Linux anyway.
My vote would still be an explicit check to see if "difference" or "busy_delta"
are <= 0, and if so we can return 0% util, rather than the try/catch. It's not
ideal, and I'd rather address why we're seeing negative deltas in the first
place, but I don't know what to do about that. Maybe if/when we implement
per-CPU stats we'll have better luck there.
Original comment by jlo...@gmail.com
on 1 Nov 2010 at 9:36
Code committed in r768, including removal of ZeroDivisionError try/except
statement.
Still have to modify Process.get_cpu_percent() in the same manner, though.
> Does this only happen on Linux? I didn't see this type of behavior on OS X,
maybe
> it's specific to Linux anyway.
I'm going to see whether this is Linux only.
I have a doubt though. Right now cpu_times() return times for:
- user
- system
- nice
- idle
- irq
- softirq
- iowait
...and we assume that t1_all = sum(cpu_times()).
Are we sure about this?
Here:
http://ubuntuforums.org/showthread.php?t=148781
...for example, someone states that for calculating the percentage only user,
nice, system, and idle times are required and the other values should be
ignored.
Just 2 quick cents btw... I'll investigate further later today.
Original comment by g.rodola
on 2 Nov 2010 at 3:31
> ...and we assume that t1_all = sum(cpu_times()).
> Are we sure about this?
The original code sample you pointed to here:
http://www.boduch.ca/2009/02/python-cpu-usage.html
Uses only user, nice, system, idle times. I think we should be doing the same,
at least on Linux.
Original comment by jlo...@gmail.com
on 2 Nov 2010 at 4:07
Tested on both Windows and FreeBSD (where we are also using irq time to fill
t_all variables) and everything is fine: busy_delta *never* happens to be
negative hence it seems the problem only affects Linux.
On Linux I tried to calculate t_all as such (user + system + nice + idle) but
didn't help either.
For some reason when the result is negative is because t2_idle happens to be
greater than t1_idle.
Original comment by g.rodola
on 2 Nov 2010 at 7:25
> For some reason when the result is negative is because t2_idle happens to be
greater than t1_idle.
Do you mean t2_idle is less than t1_idle?
Original comment by jlo...@gmail.com
on 2 Nov 2010 at 7:41
No, the opposite:
# pdb session started when busy_delta happens to be negative
(Pdb) t1.idle
9831.7000000000007
(Pdb) t2.idle
9831.7199999999993
Original comment by g.rodola
on 2 Nov 2010 at 7:48
I don't understand, why would that be a problem? Idle time should increase over
time, so t2.idle is supposed to be >= t1.idle, depending on CPU utilization
during that time period.
Original comment by jlo...@gmail.com
on 2 Nov 2010 at 7:52
You are right, sorry. Then problem is how we calculate busy time maybe?
(Pdb) t1_busy
1556.1700000000019
(Pdb) t2_busy
1556.1700000000001
What I do not understand is how t2_busy can be < t1_busy.
Or maybe I stopped to understand everything a long time ago. =)
Original comment by g.rodola
on 2 Nov 2010 at 8:01
> (Pdb) t1_busy
> 1556.1700000000019
> (Pdb) t2_busy
> 1556.1700000000001
Hmm... that looks suspiciously like a precision issue. The values are probably
effectively the same, but we could be losing some precision in the conversion
to a float or in the arithmetic we're using to get the CPU times. Trivial
example:
>>> 1 + 1.1
2.1000000000000001
Original comment by jlo...@gmail.com
on 2 Nov 2010 at 8:24
I think you're right.
This seems to fix the issue:
--- psutil/__init__.py (revisione 768)
+++ psutil/__init__.py (copia locale)
@@ -441,8 +441,16 @@
t2 = cpu_times()
t2_all = sum(t2)
t2_busy = t2_all - t2.idle
+
+ # This should avoid precision issues (t2_busy < t1_busy1).
+ # This is based on the assumption that anything beyond 2nd decimal
+ # digit is garbage.
+ # XXX - maybe it makes sense to do this for all numbers returned
+ # in cpu_times()
+ t1_busy = round(t1_busy, 2)
+ t2_busy = round(t2_busy, 2)
- if t2_busy <= t1_busy:
+ if t2_busy == t1_busy:
return 0.0
busy_delta = t2_busy - t1_busy
Original comment by g.rodola
on 2 Nov 2010 at 9:01
Excellent... do you want to make any further changes, or do you think we're ok
to commit this officially then?
Original comment by jlo...@gmail.com
on 2 Nov 2010 at 9:44
I think we're ok with psutil.cpu_percent() but Process.get_cpu_percent() is not
ok, now that I've just looked at it.
Aside from a similar code refactoring that still needs to be done, it seems it
can return values > 100.0.
The code below shows the problem:
import psutil, os, time
p = psutil.Process(4103)
while 1:
print p.get_cpu_percent()
time.sleep(.001)
Also, I'm not sure whether it's a good idea to add an "interval" parameter as
we did for system cpu percentage.
If you imagine an application like taskmgr.exe, a GUI which [lists/deals with]
all processes, a blocking cpu_percent() call is useless.
For that kind of application it is better a get_cpu_percent() function which
calculates elapsed time since last call, as the one we have right now.
Maybe we can refactor is at follows:
- adds interval parameter defaulting to 0.1.
- if interval > 0 work as system cpu_times()
- if interval == 0 return immediately and calculate the result based on time
elapsed since last call
What do you think?
Original comment by g.rodola
on 2 Nov 2010 at 10:02
> I think we're ok with psutil.cpu_percent() but Process.get_cpu_percent() is
not ok, now
> that I've just looked at it. Aside from a similar code refactoring that still
needs to be
> done, it seems it can return values > 100.0.
I am not able to reproduce this... maybe this is another Linux-only issue? If
so perhaps refactoring the code the same way will resolve it as well.
> - adds interval parameter defaulting to 0.1.
> - if interval > 0 work as system cpu_times()
> - if interval == 0 return immediately and calculate the result based on time
elapsed since last call
That's fine for me, though if you think the main use case is to prefer a
non-blocking call, maybe we should default to 0 for interval? Either way works
for me.
Original comment by jlo...@gmail.com
on 2 Nov 2010 at 10:31
Code committed as r773.
I decided to set 0.1 as interval default (hence, it's blocking).
I also removed cpu-percent related code from the constructor because it slows
down quite a bit (25%), at least on Linux.
I still like an opinion about this though, hence I'm setting WaitingForReview
state for now as long as Jay gets back.
Original comment by g.rodola
on 3 Nov 2010 at 6:11
does the code work with 0 as the interval, by calculating the interval since
the last time the function was called? If so then I'm fine with the interval
being .1 by default. I think it would be nice to have a non-blocking version of
the call, but other than that I'm fine with either 0 or .1 as default.
Original comment by jlo...@gmail.com
on 6 Nov 2010 at 2:43
You get the non-blocking version by passing 0.0 or None as the argument.
Original comment by g.rodola
on 6 Nov 2010 at 11:44
Modified psutil.cpu_percent() so that it can work in both blocking and
non-blocking mode in the same manner in r774.
Closing out as fixed.
Original comment by g.rodola
on 8 Nov 2010 at 4:52
Issue 131 has been merged into this issue.
Original comment by g.rodola
on 8 Nov 2010 at 8:57
Original comment by g.rodola
on 13 Nov 2010 at 3:14
Original comment by g.rodola
on 9 Jun 2011 at 10:33
[deleted comment]
Updated csets after the SVN -> Mercurial migration:
r768 == revision 2171f705f0f4
r773 == revision 6366524f642b
r774 == revision 589ce4a800a3
Original comment by g.rodola
on 2 Mar 2013 at 11:55
Original issue reported on code.google.com by
jlo...@gmail.com
on 27 Oct 2010 at 3:43