Closed defia closed 4 years ago
I confirm what DefiaQ said. The problem is that we call timeBeginPeriod Windows API. It was introduced as part of pprof implementation changeset: 9786:9c5c0cbadb4d user: Hector Chu <hectorchu@gmail.com> date: Sat Sep 17 17:57:59 2011 +1000 summary: runtime: implement pprof support for windows but I view it as "windows timer precision is very very low (about 15ms) comparing to other OSes, so lets get best precision whenever we can (1ms)". timeBeginPeriod has many problems. It does not work on some systems (it does not work on my oldish windows xp). I suspect, it fails, if you don't have correct access rights on the system (it works for administrator). When successfully set timeBeginPeriod will drain laptop batteries faster (I read about it everywhere, I didn't have chance to experience it on practice). But, I suspect, many other applications call timeBeginPeriod too, so Go is not the only one at fault here. Personally I don't care if timeBeginPeriod is set, but others might feel different. Perhaps there is case here for some way to control it from within Go program. ALternatively, people who care, can just remove timeBeginPeriod call in their own runtime. Alex
Too late for 1.4. Maybe for a future release, if someone wants to do the work to prove that it matters and that removing this is safe. I think without a clear report about bad implications we won't do that work ourselves.
Labels changed: added release-none, removed release-go1.4.
Status changed to Accepted.
i made a package to call timeEndPeriod to reset the timer to normal on initialization.
if the program doesn't need much performance, but runs on a windows laptop for a long time, just import the package. i've using this for some days, everything seems to be well so far
I've done some benchmarks today.
At first I ran a cpu-heavy program and modified it to use hundreds of goroutines. To my surprise,it actually ran ~5% faster if I set the system clock resolution to 15.6ms(that is default value)
Then I ran this benchmark I didn't look into the code, I guess it use only one goroutine, still pure cpu thing, and get still the same result.
Then I wrote a http benchmark here This time it ran a lot faster(~50%) with 1ms system clock resolution.
So I guess, if your program don't use much IO (cause less context switches of goroutine), you can discard the system clock resoluton change.
Could we use QueryPerformanceCounter for pprof support?
And how are you going to use QueryPerformanceCounter in pprof support?
Alex
cyclesPerSecond uses systime in os1_windows.go
https://github.com/golang/go/blob/master/src/runtime/os1_windows.go#L314-L316
https://github.com/golang/go/blob/master/src/runtime/os1_windows.go#L286-L306
This part is possible to be replaced with QueryUnbiasedInterruptTime (not QueryPerformanceCounter).
I've blogged about this issue in the past (https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/) and I wanted to share what I have learned:
As defia pointed out, having a higher timer frequency wastes CPU time and can make some CPU heavy programs run more slowly. This is because the OS is waking up ~15 times more frequently, consuming CPU time each time it does. It also means that other programs may wake up more frequently and waste CPU time - some software sits in Sleep(1) loops and will wake up ~15 times more frequently when the timer resolution is changed. So, raising the timer frequency can waste performance.
Raising the timer frequency will almost always waste some power and will harm battery life. This is the kicker. This is why no program should ever raise the timer frequency unless it needs the timer frequency raised. The timer frequency is a global setting and raising it without a good reason is poor citizenship.
If you have a work loop that relies on Sleep(1) or timeouts from WaitForSingleObject(handle, 1) then raising the timer frequency will make your loop run faster. Presumably that is why the http test ran faster. Most software should not be relying on timeouts, it should be running based on events. But, it is reasonable for specific go programs to raise the timer frequency if needed.
It is not reasonable for the go runtime to unilaterally raise timer frequency.
With the background information out of the way...
I discovered this issue because I noticed (using clockres.exe) that the timer frequency on my computer was always raised. I used "powercfg /energy /duration 5" to identify the culprit, a go program that was running in the background. This go program was monitoring system status and was idle about 99.9% of the time but because of the go runtime's decision to raise the timer frequency for all go programs this background program was measurably affecting power consumption and battery life. This 99.9% idle go program was affecting my computer more than a program that was actively running.
This was clearly a case where the program did not need a higher timer frequency. Raising the timer frequency is a reasonable option for the go runtime, but it is absolutely not a reasonable default for the go runtime.
Chrome has been fixed to not leave the timer frequency high, so has SDL, and so have many other products. Go needs to do likewise. Please.
@randomascii
I appreciate and share your concerns. So I tried to measure the impact of removing timeBeginPeriod call on my windows 7 amd64 computer. (see bench.txt in https://gist.github.com/alexbrainman/914885b57518cb4a6e39 ). Unfortunately I can see quite significant slowdowns here and there. The change looks like a no go for me. Maybe gophers who know more about Go scheduler can explain the slowdowns and suggest ways to improve the situation.
For the moment, if someone cares about this, they can comment out call to timeBeginPeriod.
Alex
I know that some video applications (games, Chrome, a few others) have to leave the frequency high in order to accurately schedule tasks and meet frame rates. For other programs I don't understand the need to have the timer frequency high. I would imagine that tasks would typically be happening as quickly as possible (reading files, doing calculations, passing messages) and I'm curious as to what sort of timeouts, or waits, or sleeps are being affected by the timer frequency.
Anyway, if some Go programs need the timer frequency then those Go programs should raise the timer frequency. I have no problem with that. But if the Go runtime raises the timer frequency then that is a problem. We could debate whether raising the timer frequency should be opt-in (my strong vote) or opt-out, but it should be a choice. Otherwise long-running low-activity Go programs will be blacklisted by anybody who wants good battery life.
Go deserves better than that.
I would imagine that tasks would typically be happening as quickly as possible ...
Things are never simple, it is always about trade-offs. But I don't know enough about Go runtime to give you explanation. And I am hopeful that other gophers will comment. All I can say is that Windows makes the task much harder when everything is measured in 15 milliseconds (comparing to nanoseconds on linux) - imagine running scheduler that performs equally well on both. But maybe Go runtime can be improved on windows, and we can disable timeBeginPeriod call.
Alex
FWIW, an ETW trace (see https://github.com/google/UIforETW/releases for the easiest way to record one) can be used to record, among other things, all context switches. This makes it possible to determine where applications are waiting.
The Windows timer resolution is a continuing source of frustration to be sure, but it only slows down code that calls Sleep(n) or WaitForSingleObject(handle, n) (where 'n' is not INFINITE).
Looking at bench.txt I see that *OneShotTimeout-2 runs slower with a lower timer frequency. That sounds like it is by-design and shouldn't be used to justify all Go programs running with a high timer frequency. Ditto with BenchmarkGoroutineIdle-2.
BenchmarkAppendGrowByte-2 runs 368% slower which suggests a benchmark bug. Either the code being tested is waiting (which it should not do) or the code that times is using timeGetTime and is stopping and starting the timer frequently and is therefore hitting aliasing errors.
I strongly suspect that the latter is the problem. timeGetTime's precision is increased when the timer frequency is increased. The benchmarks with the lower timer precision are mostly not slower they are just less accurate. The correct fix is not to raise the timer frequency (the timers will still be inaccurate) but to use accurate timers such as QueryPerformanceCounter.
It is a frustrating problem I'll agree, but raising the timer frequency so that benchmarks can measure the execution time better is not a great solution.
If it turns out that the benchmarks such as BenchmarkAppendGrowByte-2 are actually slower when the timer frequency is lower then I will be surprised and might actually investigate. Fixing slow benchmarks requires understanding why they are slow.
@randomascii I agree, we need to understand why benchmarks are slow. I looked at BenchmarkAppendGrowByte, and it does not do much - appending bytes to a slice. How do you propose I understand why BenchmarkAppendGrowByte is slow?
For the record, I started with d5fe165ca. I ran:
go test -run=none -bench=AppendGrowByte -cpu=1,1,1,1,1,2,2,2,2,2,3,3,3,3,3 runtime >old
command. Then I made this change:
diff --git a/src/runtime/os1_windows.go b/src/runtime/os1_windows.go
index f608b4a..be5a4b2 100644
--- a/src/runtime/os1_windows.go
+++ b/src/runtime/os1_windows.go
@@ -154,7 +154,7 @@ func osinit() {
stdcall2(_SetConsoleCtrlHandler, funcPC(ctrlhandler), 1)
- stdcall1(_timeBeginPeriod, 1)
+ //stdcall1(_timeBeginPeriod, 1)
ncpu = getproccount()
and I ran the same test again:
go test -run=none -bench=AppendGrowByte -cpu=1,1,1,1,1,2,2,2,2,2,3,3,3,3,3 runtime >new
And then I compared results with rsc.io/benchstat command:
c:\Users\Alex\go\go\src\runtime>%GOPATH%\bin\benchstat old new
name old time/op new time/op delta
AppendGrowByte 5.10ms ± 2% 4.94ms ± 0% -3.19% (p=0.016 n=5+4)
AppendGrowByte-2 7.59ms ±14% 36.40ms ±12% +379.45% (p=0.008 n=5+5)
AppendGrowByte-3 7.10ms ± 4% 39.54ms ± 7% +457.18% (p=0.008 n=5+5)
Here https://gist.github.com/alexbrainman/badfbe3b1e20d3f2d1b2 are my old and new files (for reference). I use Windows 7 with Intel Core2 Duo CPU E8400 (pretty average pc these days).
I suspect it has something to do with GC. But I don't even know where to start. @aclements can you suggest what to try? Thank you.
Alex
Some day I might download the code and run the tests. but it might be quite a while before I try that.
One possibility would be to record ETW traces (using https://github.com/google/UIforETW/releases). Ideally there would be some markers (using ETWMark from UIforETW's etwprof.h) inserted to indicate the start and stop of each benchmark, but just calling Sleep(150) between tests works pretty well. I could probably figure out from the traces what is going on, or at least narrow it down a bit.
The benchmarks with the lower timer precision are mostly not slower they are just less accurate.
@randomascii, what do you mean by this? Each benchmark runs in a loop for at least a second and we record the time at the beginning and end. From what I understand, you're saying that the time measurement can be off by 15ms (so the delta could be off by 30ms?), but that's only 3% on the scale of a second.
BenchmarkAppendGrowByte-2 runs 368% slower which suggests a benchmark bug. Either the code being tested is waiting (which it should not do) or the code that times is using timeGetTime and is stopping and starting the timer frequently and is therefore hitting aliasing errors.
Both the garbage collector and the scheduler do contain wait loops (e.g., https://github.com/golang/go/blob/e95bc5fef7e5e21cfdcae3095fcb8280bc3a72f8/src/runtime/mgcwork.go#L430) that after a few spins will perform several OS yields and eventually fall back to short sleeps (e.g., 100us). The fact that this hits AppendGrowByte only when GOMAXPROCS>1 is telling, since with GOMAXPROCS=1 we should never hit these wait loops. If I'm understanding correctly, these fall backs are much more expensive on Windows than they would seem. Is this right?
If building from source, it would be easy to print out the counters from these loops at the end of a GC and see what they look like in these benchmarks. Something like this at the end of func gc in runtime/mgc.go should do it:
for mp := allm; mp != nil; mp = mp.alllink {
println(mp.gcstats.nprocyield, mp.gcstats.nosyield, mp.gcstats.nsleep)
}
I'm hoping to eliminate these wait loops (#12041). If this is tied in, that's more impetus to do so.
As to your second hypothesis, I don't see timeGetTime anywhere in the Go codebase, so I'm not sure what exactly you're referring to. In the runtime, it looks like we get the time on Windows by reading from some magic address: https://github.com/golang/go/blob/e95bc5fef7e5e21cfdcae3095fcb8280bc3a72f8/src/runtime/os1_windows.go#L318
I thought that the benchmark might be stopping/starting the timers multiple times within the benchmark. If not then the maximum error should, as you observed, be quite small.
fall back to short sleeps (e.g., 100us)
Unfortunately I am not aware of a way to sleep for 100us on Windows. Sleep(0) (or WaitForSingleObject with a zero timeout) returns immediately. Sleep(1) returns in somewhere between 1 ms and 15.625 ms, depending on OS version and timer frequency and where you are relative to the next interrupt. Specifically, it will return on the next interrupt or perhaps the one after that. It's a mess and yes, it can be a lot more expensive than it would appear. And I don't know of a way around it other than don't call Sleep(1) if you care about running fast, or Sleep(0) ever.
I suspect that reading from that magic address is equivalent to timeGetTime() - it reads from a counter that is updated by the timer interrupt.
In short, it sounds like #12041 will solve the problem. An ETW trace records every context switch of every thread in every process so it is easy to see if a CPU core is running continuously or constantly sleeping. It sounds like the GC is calling Sleep(1) frequently in both cases, and those are more expensive with a lower timer frequency. Avoiding that will cause the benchmark to run faster in both cases.
Okay. It sounds like we should revisit this once I figure out how to eliminate that wait loop. :) I think it would still be worth checking the nsleep counter to lend hard evidence to this hypothesis (@alexbrainman, would that be convenient for you to do?)
Unfortunately I am not aware of a way to sleep for 100us on Windows.
It looks like we use NtWaitForSingleObject. Remarkably, as far as I can tell, the Windows documentation doesn't specify the units for the timeout argument, but the runtime code claims it takes multiples of 100ns.
@alexbrainman, would that be convenient for you to do?
Alternatively, could you collect an ETW trace of that benchmark and dump it here?
The units probably are multiples of 100 ns. However the wakeups (unless the event you wait on is signaled) are done by the scheduler which runs on the timer interrupt. So, if you specify 100 us then the function either has to return immediately (round down) or on the next timer interrupt (round up). That's the fundamental problem. If you lower the timer interval to 0.5 ms then you still get longer sleeps than desired, and wasted electricity.
Yes, Windows needs nanosleep. But it don't got it.
@alexbrainman if you record an ETW trace (see UIforETW link above) I'll analyze it.
If building from source, it would be easy to print out the counters from these loops at the end of a GC and see what they look like in these benchmarks.
See my output:
CL https://golang.org/cl/16297 mentions this issue.
At this point most, though not all, sleep loops have been eliminated on master. @alexbrainman or @defia, is this still an issue? What's the effect of returning to the default timer resolution with these fixes in?
@aclements calling timeBeginPeriod really makes no difference now on my computer. Thank you very much! I compared benchmark changes in runtime, time and net, and I see no changes after I remove call to timeBeginPeriod. What other benchmarks should I try?
Alex
Great! There are two other benchmarks I'd run before considering this closed: The go1 benchmarks in $GOROOT/test/bench/go1 and the garbage benchmark from golang.org/x/benchmarks/bench (not the standard framework; run bench with -bench garbage
and look for the GOPERF-METRIC:time line, which is the benchmark duration in ns.)
@aclements, I have checked both $GOROOT/test/bench/go1 and the garbage benchmark from golang.org/x/benchmarks/bench (in addition to all benchmarks in runtime, net and time). I also checked all these on few different computers. I don't see any difference if I call timeBeginPeriod or not. So, I propose, we remove the call: https://go-review.googlesource.com/17164
Alex
CL https://golang.org/cl/17164 mentions this issue.
That's great news.
Eliminating these sleep loops should also have improved the benchmark times has that been seen? The effect may be small - I would guess it would be ~1/16th of the difference that used to be seen when enabling/disabling timeBeginPeriod.
Eliminating these sleep loops should also have improved the benchmark times has that been seen?
I don't see any difference in benchmarks after I apply https://go-review.googlesource.com/17164 I don't know enough about scheduler to comment on your claim about "improved benchmarks". Maybe Austin will comment, but I am happy to proceed with https://go-review.googlesource.com/17164 as is.
Thank you for pushing this along.
Alex
Unfortunately we discovered some cases where not calling timeBeginPeriod makes Go scheduler considerably slower. So https://go-review.googlesource.com/#/c/20834/ restored original behavior - Go runtime is calling timeBeginPeriod again. Therefore I am reopening this issue.
Current plan is to resolve #7876 to get rid of calling timeBeginPeriod again. But nobody is working on #7876 at this moment.
Alex
Would "undoing" this revert by calling timeEndPeriod from app code have any negative effects outside of decreased performance in some cases? I develop an app that can run on laptops and having the app use less power is more important than performance.
Would "undoing" this revert by calling timeEndPeriod from app code have any negative effects outside of decreased performance in some cases?
As far as I know, bad performance is the only downside.
Alex
To be clear, in most cases any slowdowns caused by the timer frequency being lower are actually bugs.
With the exception of games or perhaps video-playback there should be no need to rely on timeouts and Sleep(n), which are the things affected. Any software that just does its work as quickly as possible and then exits will run either at exactly the same speed or slightly faster when the timer frequency is lower.
The main uses of Go that I see are long-running background apps. These apps consume about 0.01% CPU or less. However, because they are written in Go and because the Go runtime raises the timer interrupt frequency whether it helps or hurts this means that these idle apps waste more power than many more active apps.
Any progress on either fixing the scheduler so that it doesn't require the higher timer interrupt frequency, or giving Go applications a way to opt out?
Until then the recommendation I am giving is that Go should not be used for long-running apps that may be sitting idle in the background.
Any progress ... ?
I am not working on this. As far as I know what I said here https://github.com/golang/go/issues/8687#issuecomment-208094407 is still true. Maybe @dvyukov can add more. Sorry.
Alex
@randomascii you can call timeEndPeriod to revert changes by go runtime https://github.com/defia/trf
I have nothing to add.
How expensive is it to call timeBeginPeriod/timeEndPeriod? I'm wondering if it would be reasonable for us to only crank up the resolution when there are running goroutines (so the prices is consuming CPU anyway) and leave it low when the process is entirely idle.
Dunno how expensive it is, but sysmon already tracks the necessary condition, so the patch could be just:
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -3710,6 +3710,7 @@ func sysmon() {
lastscavenge := nanotime()
nscavenge := 0
+ highRes := false
lasttrace := int64(0)
idle := 0 // how many cycles in succession we had not wokeup somebody
delay := uint32(0)
@@ -3722,6 +3723,13 @@ func sysmon() {
if delay > 10*1000 { // up to 10ms
delay = 10 * 1000
}
+ if !highRes && delay < 10*1000 {
+ timeBeginPeriod()
+ highRes = true
+ } else if highRes && delay >= 10*1000 {
+ timeEndPeriod()
+ highRes = false
+ }
usleep(delay)
if debug.schedtrace <= 0 && (sched.gcwaiting != 0 || atomic.Load(&sched.npidle) == uint32(gomaxprocs)) {
lock(&sched.lock)
As far as I understand, kernel has a global counter of timeBeginPeriod across all processes, so if machine runs a bunch of Go processes and one of them does frequent timeBeginPeriod/timeEndPeriod, then it will be just atomic inc/dec of the counter since it never drops to zero.
@dvyukov, don't we want to timeEndPeriod()/timeBeginPeriod() around the notetsleep(&sched.sysmonnote, maxsleep)
?
@aclements yes
Calling timeBeginPeriod/timeEndPeriod is quite cheap. There is a brief delay before the changes take effect (probably the next timer interrupt, so ~16 ms or so) but the actual overhead of the calls is effectively zero. Chrome does this so that it can do precise timing when a web page is doing animation or otherwise requesting lots of short timeouts, while being power efficient when the loaded pages are closer to idle.
I agree that if the Go program is consuming CPU anyway then the consequences of having the timer raised are quite small. The power consequences would be essentially zero. That would be a greatly appreciated fix.
How expensive is it to call timeBeginPeriod/timeEndPeriod?
I changed:
diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
index 44c982b..249d5b4 100644
--- a/src/runtime/os_windows.go
+++ b/src/runtime/os_windows.go
@@ -50,6 +50,7 @@ const (
//go:cgo_import_dynamic runtime._WriteConsoleW WriteConsoleW%5 "kernel32.dll"
//go:cgo_import_dynamic runtime._WriteFile WriteFile%5 "kernel32.dll"
//go:cgo_import_dynamic runtime._timeBeginPeriod timeBeginPeriod%1 "winmm.dll"
+//go:cgo_import_dynamic runtime._timeEndPeriod timeBeginPeriod%1 "winmm.dll"
type stdFunction unsafe.Pointer
@@ -93,6 +94,7 @@ var (
_WriteConsoleW,
_WriteFile,
_timeBeginPeriod,
+ _timeEndPeriod,
_ stdFunction
// Following syscalls are only available on some Windows PCs.
@@ -260,6 +262,14 @@ var useLoadLibraryEx bool
var timeBeginPeriodRetValue uint32
+func TimeBeginPeriod() uint32 {
+ return uint32(stdcall1(_timeBeginPeriod, 1))
+}
+
+func TimeEndPeriod() uint32 {
+ return uint32(stdcall1(_timeEndPeriod, 1))
+}
+
func osinit() {
asmstdcallAddr = unsafe.Pointer(funcPC(asmstdcall))
usleep2Addr = unsafe.Pointer(funcPC(usleep2))
diff --git a/src/runtime/syscall_windows_test.go b/src/runtime/syscall_windows_test.go
index 11e67df..6da6325 100644
--- a/src/runtime/syscall_windows_test.go
+++ b/src/runtime/syscall_windows_test.go
@@ -1066,3 +1066,10 @@ package main
func main() {
}
`
+
+func BenchmarkTimeBeginEndPeriod(b *testing.B) {
+ for i := 0; i < b.N; i++ {
+ runtime.TimeEndPeriod()
+ runtime.TimeBeginPeriod()
+ }
+}
And I see (on my windows/386 Windows XP):
C:\>u:\test -test.v -test.run=none -test.bench=Begin
goos: windows
goarch: 386
pkg: runtime
BenchmarkTimeBeginEndPeriod-2 10000000 144 ns/op
PASS
and (on my windows/amd64 Windows 7):
C:\>u:\test -test.v -test.run=none -test.bench=Begin
goos: windows
goarch: 386
pkg: runtime
BenchmarkTimeBeginEndPeriod 20000000 104 ns/op
PASS
Alex
the patch could be just:
I am not criticizing your patch. Just a note from https://msdn.microsoft.com/en-us/library/windows/desktop/dd757626(v=vs.85).aspx:
" ... You must match each call to timeBeginPeriod with a call to timeEndPeriod, specifying the same minimum resolution in both calls. An application can make multiple timeBeginPeriod calls as long as each call is matched with a call to timeEndPeriod. ..."
Alex
Could someone give https://go-review.googlesource.com/38403 a try? It implements the idea of turning down the timer resolution when the Go process is idle.
CL https://golang.org/cl/38403 mentions this issue.