golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.98k stars 17.67k forks source link

runtime: windows system clock resolution issue #8687

Closed defia closed 4 years ago

defia commented 10 years ago
What does 'go version' print?

go version go1.3.1 windows/amd64

actually it's just the same as the issue in chrome:
https://code.google.com/p/chromium/issues/detail?id=153139

What steps reproduce the problem?

1.download and run system clock resolution tool provided by the link in
https://code.google.com/p/chromium/issues/detail?id=153139
it will show your current timer interval
2.if current timer interval smaller than 15.625ms,try to close running programs until it
goes to 15.625ms
3.run any program compiled by go

What happened?
current timer interval goes to 1ms when go compiled bins running,and goes to 15.625
after quiting

What should have happened instead?
current timer interval should not change

at least,with no time package used, the timer setting should not change, or there should
be a method to control this
ianlancetaylor commented 10 years ago

Comment 1:

Labels changed: added repo-main, release-go1.4, os-windows.

alexbrainman commented 10 years ago

Comment 2:

I confirm what DefiaQ said.
The problem is that we call timeBeginPeriod Windows API. It was introduced as part of
pprof implementation
changeset:   9786:9c5c0cbadb4d                           
user:        Hector Chu <hectorchu@gmail.com>            
date:        Sat Sep 17 17:57:59 2011 +1000              
summary:     runtime: implement pprof support for windows
but I view it as "windows timer precision is very very low (about 15ms) comparing to
other OSes, so lets get best precision whenever we can (1ms)".
timeBeginPeriod has many problems. It does not work on some systems (it does not work on
my oldish windows xp). I suspect, it fails, if you don't have correct access rights on
the system (it works for administrator). When successfully set timeBeginPeriod will
drain laptop batteries faster (I read about it everywhere, I didn't have chance to
experience it on practice). But, I suspect, many other applications call timeBeginPeriod
too, so Go is not the only one at fault here.
Personally I don't care if timeBeginPeriod is set, but others might feel different.
Perhaps there is case here for some way to control it from within Go program.
ALternatively, people who care, can just remove timeBeginPeriod call in their own
runtime.
Alex
defia commented 10 years ago

Comment 3:

it will succeed even without administrator privilege.
so far i didn't find any applications other than chrome change timer precesion
alexbrainman commented 10 years ago

Comment 4:

Well, I am not running chrome but my timer interval is set to 1ms.
Also, why did you said "... if current timer interval smaller than 15.625ms,try to close
running programs until it goes to 15.625ms .. "? You should have said " ... close chrome
...".
Alex
defia commented 10 years ago

Comment 5:

i also suspect other applications will change the setting , though i havn't found any so
far
rsc commented 10 years ago

Comment 6:

Too late for 1.4. Maybe for a future release, if someone wants to do the work to prove
that it matters and that removing this is safe. I think without a clear report about bad
implications we won't do that work ourselves.

Labels changed: added release-none, removed release-go1.4.

Status changed to Accepted.

defia commented 9 years ago

i made a package to call timeEndPeriod to reset the timer to normal on initialization.

if the program doesn't need much performance, but runs on a windows laptop for a long time, just import the package. i've using this for some days, everything seems to be well so far

defia commented 9 years ago

I've done some benchmarks today.
At first I ran a cpu-heavy program and modified it to use hundreds of goroutines. To my surprise,it actually ran ~5% faster if I set the system clock resolution to 15.6ms(that is default value)

Then I ran this benchmark I didn't look into the code, I guess it use only one goroutine, still pure cpu thing, and get still the same result.

Then I wrote a http benchmark here This time it ran a lot faster(~50%) with 1ms system clock resolution.

So I guess, if your program don't use much IO (cause less context switches of goroutine), you can discard the system clock resoluton change.

minux commented 9 years ago

Could we use QueryPerformanceCounter for pprof support?

alexbrainman commented 9 years ago

And how are you going to use QueryPerformanceCounter in pprof support?

Alex

mattn commented 9 years ago

cyclesPerSecond uses systime in os1_windows.go

https://github.com/golang/go/blob/master/src/runtime/os1_windows.go#L314-L316

https://github.com/golang/go/blob/master/src/runtime/os1_windows.go#L286-L306

This part is possible to be replaced with QueryUnbiasedInterruptTime (not QueryPerformanceCounter).

randomascii commented 9 years ago

I've blogged about this issue in the past (https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/) and I wanted to share what I have learned:

As defia pointed out, having a higher timer frequency wastes CPU time and can make some CPU heavy programs run more slowly. This is because the OS is waking up ~15 times more frequently, consuming CPU time each time it does. It also means that other programs may wake up more frequently and waste CPU time - some software sits in Sleep(1) loops and will wake up ~15 times more frequently when the timer resolution is changed. So, raising the timer frequency can waste performance.

Raising the timer frequency will almost always waste some power and will harm battery life. This is the kicker. This is why no program should ever raise the timer frequency unless it needs the timer frequency raised. The timer frequency is a global setting and raising it without a good reason is poor citizenship.

If you have a work loop that relies on Sleep(1) or timeouts from WaitForSingleObject(handle, 1) then raising the timer frequency will make your loop run faster. Presumably that is why the http test ran faster. Most software should not be relying on timeouts, it should be running based on events. But, it is reasonable for specific go programs to raise the timer frequency if needed.

It is not reasonable for the go runtime to unilaterally raise timer frequency.

randomascii commented 9 years ago

With the background information out of the way...

I discovered this issue because I noticed (using clockres.exe) that the timer frequency on my computer was always raised. I used "powercfg /energy /duration 5" to identify the culprit, a go program that was running in the background. This go program was monitoring system status and was idle about 99.9% of the time but because of the go runtime's decision to raise the timer frequency for all go programs this background program was measurably affecting power consumption and battery life. This 99.9% idle go program was affecting my computer more than a program that was actively running.

This was clearly a case where the program did not need a higher timer frequency. Raising the timer frequency is a reasonable option for the go runtime, but it is absolutely not a reasonable default for the go runtime.

Chrome has been fixed to not leave the timer frequency high, so has SDL, and so have many other products. Go needs to do likewise. Please.

alexbrainman commented 9 years ago

@randomascii

I appreciate and share your concerns. So I tried to measure the impact of removing timeBeginPeriod call on my windows 7 amd64 computer. (see bench.txt in https://gist.github.com/alexbrainman/914885b57518cb4a6e39 ). Unfortunately I can see quite significant slowdowns here and there. The change looks like a no go for me. Maybe gophers who know more about Go scheduler can explain the slowdowns and suggest ways to improve the situation.

For the moment, if someone cares about this, they can comment out call to timeBeginPeriod.

Alex

randomascii commented 9 years ago

I know that some video applications (games, Chrome, a few others) have to leave the frequency high in order to accurately schedule tasks and meet frame rates. For other programs I don't understand the need to have the timer frequency high. I would imagine that tasks would typically be happening as quickly as possible (reading files, doing calculations, passing messages) and I'm curious as to what sort of timeouts, or waits, or sleeps are being affected by the timer frequency.

Anyway, if some Go programs need the timer frequency then those Go programs should raise the timer frequency. I have no problem with that. But if the Go runtime raises the timer frequency then that is a problem. We could debate whether raising the timer frequency should be opt-in (my strong vote) or opt-out, but it should be a choice. Otherwise long-running low-activity Go programs will be blacklisted by anybody who wants good battery life.

Go deserves better than that.

alexbrainman commented 9 years ago

I would imagine that tasks would typically be happening as quickly as possible ...

Things are never simple, it is always about trade-offs. But I don't know enough about Go runtime to give you explanation. And I am hopeful that other gophers will comment. All I can say is that Windows makes the task much harder when everything is measured in 15 milliseconds (comparing to nanoseconds on linux) - imagine running scheduler that performs equally well on both. But maybe Go runtime can be improved on windows, and we can disable timeBeginPeriod call.

Alex

randomascii commented 9 years ago

FWIW, an ETW trace (see https://github.com/google/UIforETW/releases for the easiest way to record one) can be used to record, among other things, all context switches. This makes it possible to determine where applications are waiting.

The Windows timer resolution is a continuing source of frustration to be sure, but it only slows down code that calls Sleep(n) or WaitForSingleObject(handle, n) (where 'n' is not INFINITE).

Looking at bench.txt I see that *OneShotTimeout-2 runs slower with a lower timer frequency. That sounds like it is by-design and shouldn't be used to justify all Go programs running with a high timer frequency. Ditto with BenchmarkGoroutineIdle-2.

BenchmarkAppendGrowByte-2 runs 368% slower which suggests a benchmark bug. Either the code being tested is waiting (which it should not do) or the code that times is using timeGetTime and is stopping and starting the timer frequently and is therefore hitting aliasing errors.

I strongly suspect that the latter is the problem. timeGetTime's precision is increased when the timer frequency is increased. The benchmarks with the lower timer precision are mostly not slower they are just less accurate. The correct fix is not to raise the timer frequency (the timers will still be inaccurate) but to use accurate timers such as QueryPerformanceCounter.

It is a frustrating problem I'll agree, but raising the timer frequency so that benchmarks can measure the execution time better is not a great solution.

If it turns out that the benchmarks such as BenchmarkAppendGrowByte-2 are actually slower when the timer frequency is lower then I will be surprised and might actually investigate. Fixing slow benchmarks requires understanding why they are slow.

alexbrainman commented 9 years ago

@randomascii I agree, we need to understand why benchmarks are slow. I looked at BenchmarkAppendGrowByte, and it does not do much - appending bytes to a slice. How do you propose I understand why BenchmarkAppendGrowByte is slow?

For the record, I started with d5fe165ca. I ran:

go test -run=none -bench=AppendGrowByte -cpu=1,1,1,1,1,2,2,2,2,2,3,3,3,3,3 runtime >old

command. Then I made this change:

diff --git a/src/runtime/os1_windows.go b/src/runtime/os1_windows.go
index f608b4a..be5a4b2 100644
--- a/src/runtime/os1_windows.go
+++ b/src/runtime/os1_windows.go
@@ -154,7 +154,7 @@ func osinit() {

        stdcall2(_SetConsoleCtrlHandler, funcPC(ctrlhandler), 1)

-       stdcall1(_timeBeginPeriod, 1)
+       //stdcall1(_timeBeginPeriod, 1)

        ncpu = getproccount()

and I ran the same test again:

go test -run=none -bench=AppendGrowByte -cpu=1,1,1,1,1,2,2,2,2,2,3,3,3,3,3 runtime >new

And then I compared results with rsc.io/benchstat command:

c:\Users\Alex\go\go\src\runtime>%GOPATH%\bin\benchstat old new
name              old time/op  new time/op   delta
AppendGrowByte    5.10ms ± 2%   4.94ms ± 0%    -3.19%  (p=0.016 n=5+4)
AppendGrowByte-2  7.59ms ±14%  36.40ms ±12%  +379.45%  (p=0.008 n=5+5)
AppendGrowByte-3  7.10ms ± 4%  39.54ms ± 7%  +457.18%  (p=0.008 n=5+5)

Here https://gist.github.com/alexbrainman/badfbe3b1e20d3f2d1b2 are my old and new files (for reference). I use Windows 7 with Intel Core2 Duo CPU E8400 (pretty average pc these days).

I suspect it has something to do with GC. But I don't even know where to start. @aclements can you suggest what to try? Thank you.

Alex

randomascii commented 9 years ago

Some day I might download the code and run the tests. but it might be quite a while before I try that.

One possibility would be to record ETW traces (using https://github.com/google/UIforETW/releases). Ideally there would be some markers (using ETWMark from UIforETW's etwprof.h) inserted to indicate the start and stop of each benchmark, but just calling Sleep(150) between tests works pretty well. I could probably figure out from the traces what is going on, or at least narrow it down a bit.

aclements commented 9 years ago

The benchmarks with the lower timer precision are mostly not slower they are just less accurate.

@randomascii, what do you mean by this? Each benchmark runs in a loop for at least a second and we record the time at the beginning and end. From what I understand, you're saying that the time measurement can be off by 15ms (so the delta could be off by 30ms?), but that's only 3% on the scale of a second.

BenchmarkAppendGrowByte-2 runs 368% slower which suggests a benchmark bug. Either the code being tested is waiting (which it should not do) or the code that times is using timeGetTime and is stopping and starting the timer frequently and is therefore hitting aliasing errors.

Both the garbage collector and the scheduler do contain wait loops (e.g., https://github.com/golang/go/blob/e95bc5fef7e5e21cfdcae3095fcb8280bc3a72f8/src/runtime/mgcwork.go#L430) that after a few spins will perform several OS yields and eventually fall back to short sleeps (e.g., 100us). The fact that this hits AppendGrowByte only when GOMAXPROCS>1 is telling, since with GOMAXPROCS=1 we should never hit these wait loops. If I'm understanding correctly, these fall backs are much more expensive on Windows than they would seem. Is this right?

If building from source, it would be easy to print out the counters from these loops at the end of a GC and see what they look like in these benchmarks. Something like this at the end of func gc in runtime/mgc.go should do it:

for mp := allm; mp != nil; mp = mp.alllink {
    println(mp.gcstats.nprocyield, mp.gcstats.nosyield, mp.gcstats.nsleep)
}

I'm hoping to eliminate these wait loops (#12041). If this is tied in, that's more impetus to do so.

As to your second hypothesis, I don't see timeGetTime anywhere in the Go codebase, so I'm not sure what exactly you're referring to. In the runtime, it looks like we get the time on Windows by reading from some magic address: https://github.com/golang/go/blob/e95bc5fef7e5e21cfdcae3095fcb8280bc3a72f8/src/runtime/os1_windows.go#L318

randomascii commented 9 years ago

I thought that the benchmark might be stopping/starting the timers multiple times within the benchmark. If not then the maximum error should, as you observed, be quite small.

fall back to short sleeps (e.g., 100us)

Unfortunately I am not aware of a way to sleep for 100us on Windows. Sleep(0) (or WaitForSingleObject with a zero timeout) returns immediately. Sleep(1) returns in somewhere between 1 ms and 15.625 ms, depending on OS version and timer frequency and where you are relative to the next interrupt. Specifically, it will return on the next interrupt or perhaps the one after that. It's a mess and yes, it can be a lot more expensive than it would appear. And I don't know of a way around it other than don't call Sleep(1) if you care about running fast, or Sleep(0) ever.

I suspect that reading from that magic address is equivalent to timeGetTime() - it reads from a counter that is updated by the timer interrupt.

In short, it sounds like #12041 will solve the problem. An ETW trace records every context switch of every thread in every process so it is easy to see if a CPU core is running continuously or constantly sleeping. It sounds like the GC is calling Sleep(1) frequently in both cases, and those are more expensive with a lower timer frequency. Avoiding that will cause the benchmark to run faster in both cases.

aclements commented 9 years ago

Okay. It sounds like we should revisit this once I figure out how to eliminate that wait loop. :) I think it would still be worth checking the nsleep counter to lend hard evidence to this hypothesis (@alexbrainman, would that be convenient for you to do?)

Unfortunately I am not aware of a way to sleep for 100us on Windows.

It looks like we use NtWaitForSingleObject. Remarkably, as far as I can tell, the Windows documentation doesn't specify the units for the timeout argument, but the runtime code claims it takes multiples of 100ns.

aclements commented 9 years ago

@alexbrainman, would that be convenient for you to do?

Alternatively, could you collect an ETW trace of that benchmark and dump it here?

randomascii commented 9 years ago

The units probably are multiples of 100 ns. However the wakeups (unless the event you wait on is signaled) are done by the scheduler which runs on the timer interrupt. So, if you specify 100 us then the function either has to return immediately (round down) or on the next timer interrupt (round up). That's the fundamental problem. If you lower the timer interval to 0.5 ms then you still get longer sleeps than desired, and wasted electricity.

Yes, Windows needs nanosleep. But it don't got it.

@alexbrainman if you record an ETW trace (see UIforETW link above) I'll analyze it.

alexbrainman commented 9 years ago

If building from source, it would be easy to print out the counters from these loops at the end of a GC and see what they look like in these benchmarks.

See my output:

``` c:\Users\Alex\go\go\src\runtime>git diff diff --git a/src/runtime/mgc.go b/src/runtime/mgc.go index 848e8f6..eef12dd 100644 --- a/src/runtime/mgc.go +++ b/src/runtime/mgc.go @@ -1210,6 +1210,10 @@ func gc(mode int) { // give the queued finalizers, if any, a chance to run Gosched() } + + for mp := allm; mp != nil; mp = mp.alllink { + println(mp.gcstats.nprocyield, mp.gcstats.nosyield, mp.gcstats.nsleep) + } } // gcBgMarkStartWorkers prepares background mark worker goroutines. diff --git a/src/runtime/os1_windows.go b/src/runtime/os1_windows.go index f608b4a..be5a4b2 100644 --- a/src/runtime/os1_windows.go +++ b/src/runtime/os1_windows.go @@ -154,7 +154,7 @@ func osinit() { stdcall2(_SetConsoleCtrlHandler, funcPC(ctrlhandler), 1) - stdcall1(_timeBeginPeriod, 1) + //stdcall1(_timeBeginPeriod, 1) ncpu = getproccount() c:\Users\Alex\go\go\src\runtime>go test -run=none -bench=AppendGrowByte PASS BenchmarkAppendGrowByte-2 0 0 0 8 0 0 0 0 0 0 0 0 10 1 0 0 0 0 8 0 0 0 0 0 0 0 0 20 2 0 0 0 0 30 1 0 0 0 0 0 0 0 20 2 0 10 1 0 30 1 0 0 0 0 0 0 0 20 2 0 20 2 0 30 1 0 0 0 0 0 0 0 20 2 0 20 2 0 40 2 0 0 0 0 0 0 0 20 2 0 20 2 0 40 2 0 0 0 0 0 0 0 30 3 0 20 2 0 50 3 0 0 0 0 0 0 0 30 3 0 20 2 0 50 3 0 0 0 0 0 0 0 40 4 0 20 2 0 60 4 0 0 0 0 0 0 0 40 4 0 30 3 0 60 4 0 0 0 0 0 0 0 40 4 0 30 3 0 60 4 0 0 0 0 0 0 0 50 5 0 30 3 0 70 5 0 0 0 0 0 0 0 50 5 0 30 3 0 80 6 0 0 0 0 0 0 0 50 5 0 30 3 0 90 7 0 0 0 0 0 0 0 50 5 0 30 3 0 100 8 0 0 0 0 0 0 0 50 5 0 40 4 0 100 8 0 0 0 0 0 0 0 50 5 0 40 4 0 100 8 0 0 0 0 0 0 0 60 6 0 40 4 0 100 8 0 0 0 0 0 0 0 70 7 0 40 4 0 110 9 0 0 0 0 0 0 0 70 7 0 50 5 0 110 9 0 0 0 0 0 0 0 70 7 0 50 5 0 110 9 0 0 0 0 0 0 0 80 8 0 50 5 0 120 10 0 0 0 0 0 0 0 80 8 0 50 5 0 120 10 0 0 0 0 0 0 0 90 9 0 50 5 0 130 11 0 0 0 0 0 0 0 90 9 0 60 6 0 130 11 0 0 0 0 0 0 0 90 9 0 60 6 0 140 12 0 0 0 0 0 0 0 90 9 0 60 6 0 140 12 0 0 0 0 0 0 0 100 10 0 70 7 0 140 12 0 0 0 0 0 0 0 100 10 0 70 7 0 150 14 0 0 0 0 0 0 0 100 10 0 70 7 0 150 14 0 0 0 0 0 0 0 110 11 0 70 7 0 160 15 0 0 0 0 0 0 0 110 11 0 80 8 0 160 15 0 0 0 0 0 0 0 110 11 0 80 8 0 160 15 0 0 0 0 0 0 0 120 12 0 80 8 0 170 16 0 0 0 0 0 0 0 120 12 0 80 8 0 180 17 0 0 0 0 0 0 0 120 12 0 80 8 0 180 17 0 0 0 0 0 0 0 130 13 0 80 8 0 190 18 0 0 0 0 0 0 0 130 13 0 80 8 0 200 19 0 0 0 0 0 0 0 130 13 0 80 8 0 200 19 0 0 0 0 0 0 0 140 14 0 90 9 0 200 19 0 0 0 0 0 0 0 140 14 0 90 9 0 210 20 0 0 0 0 0 0 0 140 14 0 90 9 0 220 21 0 0 0 0 0 0 0 140 14 0 90 9 0 220 21 0 0 0 0 0 0 0 150 15 0 100 10 0 220 21 0 0 0 0 0 0 0 150 15 0 100 10 0 230 22 0 0 0 0 0 0 0 150 15 0 100 10 0 230 22 0 0 0 0 0 0 0 160 16 0 100 10 0 240 23 0 0 0 0 0 0 0 160 16 0 110 11 0 240 23 0 0 0 0 0 0 0 160 16 0 110 11 0 250 24 0 0 0 0 0 0 0 160 16 0 110 11 0 260 25 0 0 0 0 0 0 0 160 16 0 120 12 0 260 25 0 0 0 0 0 0 0 160 16 0 120 12 0 260 25 0 0 0 0 0 0 0 170 17 0 130 13 0 260 25 0 0 0 0 0 0 0 170 17 0 130 13 0 270 26 0 0 0 0 0 0 0 170 17 0 130 13 0 270 26 0 0 0 0 0 0 0 180 18 0 130 13 0 270 26 0 0 0 0 0 0 0 190 19 0 140 14 0 270 26 0 0 0 0 0 0 0 190 19 0 150 15 0 270 26 0 0 0 0 0 0 0 190 19 0 150 15 0 270 26 0 0 0 0 0 0 0 200 20 0 150 15 0 280 27 0 0 0 0 0 0 0 200 20 0 160 16 0 280 27 0 0 0 0 0 0 0 200 20 0 160 16 0 290 28 0 0 0 0 0 0 0 200 20 0 160 16 0 290 28 0 0 0 0 0 0 0 210 21 0 160 16 0 300 29 0 0 0 0 0 0 0 210 21 0 170 17 0 300 29 0 0 0 0 0 0 0 210 21 0 170 17 0 300 29 0 0 0 0 0 0 0 220 22 0 180 18 0 300 29 0 0 0 0 0 0 0 220 22 0 180 18 0 300 29 0 0 0 0 0 0 0 230 24 0 190 19 0 300 29 0 0 0 0 0 0 0 230 24 0 190 19 0 310 30 0 0 0 0 0 0 0 230 24 0 200 20 0 310 30 0 0 0 0 0 0 0 230 24 0 200 20 0 310 30 0 0 0 0 0 0 0 240 25 0 200 20 0 320 31 0 0 0 0 0 0 0 240 25 0 210 21 0 320 31 0 0 0 0 0 0 0 240 25 0 210 21 0 330 32 0 0 0 0 0 0 0 240 25 0 220 22 0 330 32 0 0 0 0 0 0 0 240 25 0 220 22 0 340 33 0 0 0 0 0 0 0 240 25 0 220 22 0 340 33 0 0 0 0 0 0 0 250 26 0 220 22 0 350 34 0 0 0 0 0 0 0 250 26 0 230 23 0 350 34 0 0 0 0 0 0 0 250 26 0 230 23 0 360 35 0 0 0 0 0 0 0 250 26 0 230 23 0 360 35 0 0 0 0 0 0 0 260 27 0 230 23 0 360 35 0 0 0 0 0 0 0 270 28 0 230 23 0 370 36 0 0 0 0 0 0 0 270 28 0 230 23 0 370 36 0 0 0 0 0 0 0 280 29 0 240 24 0 370 36 0 0 0 0 0 0 0 280 29 0 240 24 0 380 37 0 0 0 0 0 0 0 280 29 0 240 24 0 390 38 0 0 0 0 0 0 0 280 29 0 250 25 0 390 38 0 0 0 0 0 0 0 280 29 0 260 26 0 390 38 0 0 0 0 0 0 0 280 29 0 270 27 0 390 38 0 0 0 0 0 0 0 280 29 0 270 27 0 400 39 0 0 0 0 0 0 0 280 29 0 280 28 0 400 39 0 0 0 0 0 0 0 280 29 0 280 28 0 410 40 0 0 0 0 0 0 0 280 29 0 290 29 0 410 40 0 0 0 0 0 0 0 280 29 0 290 29 0 410 40 0 0 0 0 0 0 0 290 30 0 290 29 0 420 41 0 0 0 0 0 0 0 290 30 0 300 30 0 420 41 0 0 0 0 0 0 0 290 30 0 300 30 0 430 42 0 0 0 0 0 0 0 290 30 0 300 30 0 430 42 0 0 0 0 0 0 0 300 31 0 300 30 0 440 43 0 0 0 0 0 0 0 300 31 0 310 31 0 440 43 0 0 0 0 0 0 0 300 31 0 310 31 0 440 43 0 0 0 0 0 0 0 310 32 0 310 31 0 440 43 0 0 0 0 0 0 0 320 33 0 310 31 0 450 44 0 0 0 0 0 0 0 320 33 0 320 32 0 450 44 0 0 0 0 0 0 0 320 33 0 320 32 0 460 45 0 0 0 0 0 0 0 320 33 0 330 33 0 460 45 0 0 0 0 0 0 0 320 33 0 330 33 0 460 45 0 0 0 0 0 0 0 330 34 0 330 33 0 460 45 0 0 0 0 0 0 0 340 35 0 330 33 0 470 46 0 0 0 0 0 0 0 340 35 0 340 34 0 470 46 0 0 0 0 0 0 0 340 35 0 340 34 0 470 46 0 0 0 0 0 0 0 350 36 0 340 34 0 480 47 0 0 0 0 0 0 0 350 36 0 350 35 0 480 47 0 0 0 0 0 0 0 350 36 0 350 35 0 480 47 0 0 0 0 0 0 0 360 37 0 350 35 0 490 48 0 0 0 0 0 0 0 360 37 0 360 36 0 490 48 0 0 0 0 0 0 0 360 37 0 360 36 0 490 48 0 0 0 0 0 0 0 370 38 0 360 36 0 500 49 0 0 0 0 0 0 0 370 38 0 360 36 0 500 49 0 0 0 0 0 0 0 380 39 0 360 36 0 500 49 0 0 0 0 0 0 0 390 40 0 360 36 0 500 49 0 0 0 0 0 0 0 400 41 0 370 37 0 500 49 0 0 0 0 0 0 0 400 41 0 370 37 0 510 50 0 0 0 0 0 0 0 400 41 0 370 37 0 510 50 0 0 0 0 0 0 0 410 42 0 370 37 0 520 51 0 0 0 0 0 0 0 410 42 0 370 37 0 520 51 0 0 0 0 0 0 0 420 43 0 380 38 0 520 51 0 0 0 0 0 0 0 420 43 0 380 38 0 520 51 0 0 0 0 0 0 0 430 44 0 390 39 0 520 51 0 0 0 0 0 0 0 430 44 0 390 39 0 520 51 0 0 0 0 0 0 0 440 45 0 390 39 0 520 51 0 0 0 0 0 0 0 450 46 0 400 40 0 520 51 0 0 0 0 0 0 0 450 46 0 400 40 0 520 51 0 0 0 0 0 0 0 460 48 0 400 40 0 530 52 0 0 0 0 0 0 0 460 48 0 400 40 0 540 53 0 0 0 0 0 0 0 460 48 0 410 41 0 540 53 0 0 0 0 0 0 0 460 48 0 410 41 0 550 54 0 0 0 0 0 0 0 460 48 0 410 41 0 550 54 0 0 0 0 0 0 0 470 49 0 420 42 0 550 54 0 0 0 0 0 0 0 470 49 0 430 43 0 550 54 0 0 0 0 0 0 0 470 49 0 440 44 0 550 54 0 0 0 0 0 0 0 470 49 0 440 44 0 560 55 0 0 0 0 0 0 0 470 49 0 440 44 0 560 55 0 0 0 0 0 0 0 480 50 0 450 45 0 560 55 0 0 0 0 0 0 0 480 50 0 460 46 0 560 55 0 0 0 0 0 0 0 480 50 0 460 46 0 570 56 0 0 0 0 0 0 0 480 50 0 460 46 0 570 56 0 0 0 0 0 0 0 490 51 0 460 46 0 570 56 0 0 0 0 0 0 0 500 52 0 460 46 0 580 57 0 0 0 0 0 0 0 500 52 0 470 47 0 580 57 0 0 0 0 0 0 0 500 52 0 480 48 0 580 57 0 0 0 0 0 0 0 500 52 0 480 48 0 580 57 0 0 0 0 0 0 0 510 53 0 480 48 0 590 58 0 0 0 0 0 0 0 510 53 0 490 49 0 590 58 0 0 0 0 0 0 0 510 53 0 490 49 0 590 58 0 0 0 0 0 0 0 520 54 0 500 50 0 590 58 0 0 0 0 0 0 0 520 54 0 500 50 0 600 59 0 0 0 0 0 0 0 520 54 0 500 50 0 600 59 0 0 0 0 0 0 0 530 55 0 510 51 0 600 59 0 0 0 0 0 0 0 530 55 0 510 51 0 600 59 0 0 0 0 0 0 0 540 56 0 510 51 0 610 60 0 0 0 0 0 0 0 540 56 0 510 51 0 610 60 0 0 0 0 0 0 0 550 57 0 510 51 0 620 61 0 0 0 0 0 0 0 550 57 0 520 52 0 620 61 0 0 0 0 0 0 0 550 57 0 520 52 0 620 61 0 0 0 0 0 0 0 560 58 0 520 52 0 630 62 0 0 0 0 0 0 0 560 58 0 520 52 0 630 62 0 0 0 0 0 0 0 570 59 0 530 53 0 630 62 0 0 0 0 0 0 0 570 59 0 540 54 0 630 62 0 0 0 0 0 0 0 570 59 0 540 54 0 630 62 0 0 0 0 0 0 0 580 60 0 540 54 0 640 63 0 0 0 0 0 0 0 580 60 0 550 55 0 640 63 0 0 0 0 0 0 0 580 60 0 550 55 0 650 64 0 0 0 0 0 0 0 580 60 0 550 55 0 650 64 0 0 0 0 0 0 0 590 61 0 550 55 0 660 65 0 0 0 0 0 0 0 590 61 0 550 55 0 660 65 0 0 0 0 0 0 0 600 62 0 560 56 0 660 65 0 0 0 0 0 0 0 600 62 0 560 56 0 670 66 0 0 0 0 0 0 0 600 62 0 560 56 0 670 66 0 0 0 0 0 0 0 610 63 0 560 56 0 680 67 0 0 0 0 0 0 0 610 63 0 570 57 0 680 67 0 0 0 0 0 0 0 610 63 0 580 58 0 680 67 0 0 0 0 0 0 0 610 63 0 580 58 0 690 68 0 0 0 0 0 0 0 610 63 0 580 58 0 690 68 0 0 0 0 0 0 0 620 64 0 590 59 0 690 68 0 0 0 0 0 0 0 620 64 0 590 59 0 700 69 0 0 0 0 0 0 0 620 64 0 590 59 0 700 69 0 0 0 0 0 0 0 630 65 0 590 59 0 710 70 0 0 0 0 0 0 0 630 65 0 590 59 0 710 70 0 0 0 0 0 0 0 640 66 0 600 60 0 710 70 0 0 0 0 0 0 0 640 66 0 600 60 0 710 70 0 0 0 0 0 0 0 650 67 0 600 60 0 720 71 0 0 0 0 0 0 0 650 67 0 600 60 0 730 72 0 0 0 0 0 0 0 650 67 0 600 60 0 730 72 0 0 0 0 0 0 0 660 68 0 600 60 0 730 72 0 0 0 0 0 0 0 670 69 0 600 60 0 740 73 0 0 0 0 0 0 0 670 69 0 610 61 0 740 73 0 0 0 0 0 0 0 670 69 0 610 61 0 740 73 0 0 0 0 0 0 0 680 70 0 610 61 0 750 74 0 0 0 0 0 0 0 680 70 0 620 62 0 750 74 0 0 0 0 0 0 0 680 70 0 620 62 0 750 74 0 0 0 0 0 0 0 690 71 0 620 62 0 760 75 0 0 0 0 0 0 0 690 71 0 620 62 0 760 75 0 0 0 0 0 0 0 700 72 0 620 62 0 760 75 0 0 0 0 0 0 0 710 73 0 620 62 0 760 75 0 0 0 0 0 0 0 720 74 0 630 63 0 760 75 0 0 0 0 0 0 0 720 74 0 630 63 0 770 76 0 0 0 0 0 0 0 720 74 0 630 63 0 770 76 0 0 0 0 0 0 0 730 75 0 630 63 0 770 76 0 0 0 0 0 0 0 740 76 0 640 64 0 770 76 0 0 0 0 0 0 0 740 76 0 640 64 0 780 77 0 0 0 0 0 0 0 740 76 0 100 20334220 ns/op ok runtime 3.765s c:\Users\Alex\go\go\src\runtime> ```

> I think it would still be worth checking the nsleep counter to lend hard evidence to this hypothesis (@alexbrainman, would that be convenient for you to do?) I am not sure what you want. > It looks like we use NtWaitForSingleObject. Remarkably, as far as I can tell, the Windows documentation doesn't specify the units for the timeout argument, but the runtime code claims it takes multiples of 100ns. I don't believe there is a way to wait for 100ns. I can try NtWaitForSingleObject, but I think we've tried that. I think you have a choice of 15ms (default) or 1ms (supported on most modern Windows versions, but you would have to call timeBeginPeriod WinAPI to indicate that you want 1ms precision). Go runtime calls timeBeginPeriod to have 1ms waits, but that drains people's laptop batteries faster. And timeBeginPeriod is a global resource - one single program calls it and whole computer gets it. And you need to be admin (or similar) for timeBeginPeriod to succeed - so it is a hit and miss affair - some Go programs runs with timeBeginPeriod on and some with off. So it would be nice to stop using this facility altogether. If we can make Go runtime perform adequately regardless of this setting. > Alternatively, could you collect an ETW trace of that benchmark and dump it here? I am not sure what to download and run there. And I don't like running things I download from Internet on my computer if I can avoid it. I hope my output above gives you what you need. > @alexbrainman if you record an ETW trace (see UIforETW link above) I'll analyze it. Will do if @aclements wants me to do that. Thank you both for your input. Alex
gopherbot commented 9 years ago

CL https://golang.org/cl/16297 mentions this issue.

aclements commented 8 years ago

At this point most, though not all, sleep loops have been eliminated on master. @alexbrainman or @defia, is this still an issue? What's the effect of returning to the default timer resolution with these fixes in?

alexbrainman commented 8 years ago

@aclements calling timeBeginPeriod really makes no difference now on my computer. Thank you very much! I compared benchmark changes in runtime, time and net, and I see no changes after I remove call to timeBeginPeriod. What other benchmarks should I try?

Alex

aclements commented 8 years ago

Great! There are two other benchmarks I'd run before considering this closed: The go1 benchmarks in $GOROOT/test/bench/go1 and the garbage benchmark from golang.org/x/benchmarks/bench (not the standard framework; run bench with -bench garbage and look for the GOPERF-METRIC:time line, which is the benchmark duration in ns.)

alexbrainman commented 8 years ago

@aclements, I have checked both $GOROOT/test/bench/go1 and the garbage benchmark from golang.org/x/benchmarks/bench (in addition to all benchmarks in runtime, net and time). I also checked all these on few different computers. I don't see any difference if I call timeBeginPeriod or not. So, I propose, we remove the call: https://go-review.googlesource.com/17164

Alex

gopherbot commented 8 years ago

CL https://golang.org/cl/17164 mentions this issue.

randomascii commented 8 years ago

That's great news.

Eliminating these sleep loops should also have improved the benchmark times has that been seen? The effect may be small - I would guess it would be ~1/16th of the difference that used to be seen when enabling/disabling timeBeginPeriod.

alexbrainman commented 8 years ago

Eliminating these sleep loops should also have improved the benchmark times has that been seen?

I don't see any difference in benchmarks after I apply https://go-review.googlesource.com/17164 I don't know enough about scheduler to comment on your claim about "improved benchmarks". Maybe Austin will comment, but I am happy to proceed with https://go-review.googlesource.com/17164 as is.

Thank you for pushing this along.

Alex

alexbrainman commented 8 years ago

Unfortunately we discovered some cases where not calling timeBeginPeriod makes Go scheduler considerably slower. So https://go-review.googlesource.com/#/c/20834/ restored original behavior - Go runtime is calling timeBeginPeriod again. Therefore I am reopening this issue.

Current plan is to resolve #7876 to get rid of calling timeBeginPeriod again. But nobody is working on #7876 at this moment.

Alex

ToadKing commented 8 years ago

Would "undoing" this revert by calling timeEndPeriod from app code have any negative effects outside of decreased performance in some cases? I develop an app that can run on laptops and having the app use less power is more important than performance.

alexbrainman commented 8 years ago

Would "undoing" this revert by calling timeEndPeriod from app code have any negative effects outside of decreased performance in some cases?

As far as I know, bad performance is the only downside.

Alex

randomascii commented 8 years ago

To be clear, in most cases any slowdowns caused by the timer frequency being lower are actually bugs.

With the exception of games or perhaps video-playback there should be no need to rely on timeouts and Sleep(n), which are the things affected. Any software that just does its work as quickly as possible and then exits will run either at exactly the same speed or slightly faster when the timer frequency is lower.

randomascii commented 7 years ago

The main uses of Go that I see are long-running background apps. These apps consume about 0.01% CPU or less. However, because they are written in Go and because the Go runtime raises the timer interrupt frequency whether it helps or hurts this means that these idle apps waste more power than many more active apps.

Any progress on either fixing the scheduler so that it doesn't require the higher timer interrupt frequency, or giving Go applications a way to opt out?

Until then the recommendation I am giving is that Go should not be used for long-running apps that may be sitting idle in the background.

alexbrainman commented 7 years ago

Any progress ... ?

I am not working on this. As far as I know what I said here https://github.com/golang/go/issues/8687#issuecomment-208094407 is still true. Maybe @dvyukov can add more. Sorry.

Alex

defia commented 7 years ago

@randomascii you can call timeEndPeriod to revert changes by go runtime https://github.com/defia/trf

dvyukov commented 7 years ago

I have nothing to add.

aclements commented 7 years ago

How expensive is it to call timeBeginPeriod/timeEndPeriod? I'm wondering if it would be reasonable for us to only crank up the resolution when there are running goroutines (so the prices is consuming CPU anyway) and leave it low when the process is entirely idle.

dvyukov commented 7 years ago

Dunno how expensive it is, but sysmon already tracks the necessary condition, so the patch could be just:

--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -3710,6 +3710,7 @@ func sysmon() {
        lastscavenge := nanotime()
        nscavenge := 0

+       highRes := false
        lasttrace := int64(0)
        idle := 0 // how many cycles in succession we had not wokeup somebody
        delay := uint32(0)
@@ -3722,6 +3723,13 @@ func sysmon() {
                if delay > 10*1000 { // up to 10ms
                        delay = 10 * 1000
                }
+               if !highRes && delay < 10*1000 {
+                       timeBeginPeriod()
+                       highRes = true
+               } else if highRes && delay >= 10*1000 {
+                       timeEndPeriod()
+                       highRes = false
+               }
                usleep(delay)
                if debug.schedtrace <= 0 && (sched.gcwaiting != 0 || atomic.Load(&sched.npidle) == uint32(gomaxprocs)) {
                        lock(&sched.lock)

As far as I understand, kernel has a global counter of timeBeginPeriod across all processes, so if machine runs a bunch of Go processes and one of them does frequent timeBeginPeriod/timeEndPeriod, then it will be just atomic inc/dec of the counter since it never drops to zero.

aclements commented 7 years ago

@dvyukov, don't we want to timeEndPeriod()/timeBeginPeriod() around the notetsleep(&sched.sysmonnote, maxsleep)?

dvyukov commented 7 years ago

@aclements yes

randomascii commented 7 years ago

Calling timeBeginPeriod/timeEndPeriod is quite cheap. There is a brief delay before the changes take effect (probably the next timer interrupt, so ~16 ms or so) but the actual overhead of the calls is effectively zero. Chrome does this so that it can do precise timing when a web page is doing animation or otherwise requesting lots of short timeouts, while being power efficient when the loaded pages are closer to idle.

I agree that if the Go program is consuming CPU anyway then the consequences of having the timer raised are quite small. The power consequences would be essentially zero. That would be a greatly appreciated fix.

alexbrainman commented 7 years ago

How expensive is it to call timeBeginPeriod/timeEndPeriod?

I changed:

diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
index 44c982b..249d5b4 100644
--- a/src/runtime/os_windows.go
+++ b/src/runtime/os_windows.go
@@ -50,6 +50,7 @@ const (
 //go:cgo_import_dynamic runtime._WriteConsoleW WriteConsoleW%5 "kernel32.dll"
 //go:cgo_import_dynamic runtime._WriteFile WriteFile%5 "kernel32.dll"
 //go:cgo_import_dynamic runtime._timeBeginPeriod timeBeginPeriod%1 "winmm.dll"
+//go:cgo_import_dynamic runtime._timeEndPeriod timeBeginPeriod%1 "winmm.dll"

 type stdFunction unsafe.Pointer

@@ -93,6 +94,7 @@ var (
    _WriteConsoleW,
    _WriteFile,
    _timeBeginPeriod,
+   _timeEndPeriod,
    _ stdFunction

    // Following syscalls are only available on some Windows PCs.
@@ -260,6 +262,14 @@ var useLoadLibraryEx bool

 var timeBeginPeriodRetValue uint32

+func TimeBeginPeriod() uint32 {
+   return uint32(stdcall1(_timeBeginPeriod, 1))
+}
+
+func TimeEndPeriod() uint32 {
+   return uint32(stdcall1(_timeEndPeriod, 1))
+}
+
 func osinit() {
    asmstdcallAddr = unsafe.Pointer(funcPC(asmstdcall))
    usleep2Addr = unsafe.Pointer(funcPC(usleep2))
diff --git a/src/runtime/syscall_windows_test.go b/src/runtime/syscall_windows_test.go
index 11e67df..6da6325 100644
--- a/src/runtime/syscall_windows_test.go
+++ b/src/runtime/syscall_windows_test.go
@@ -1066,3 +1066,10 @@ package main
 func main() {
 }
 `
+
+func BenchmarkTimeBeginEndPeriod(b *testing.B) {
+   for i := 0; i < b.N; i++ {
+       runtime.TimeEndPeriod()
+       runtime.TimeBeginPeriod()
+   }
+}

And I see (on my windows/386 Windows XP):

C:\>u:\test -test.v -test.run=none -test.bench=Begin
goos: windows
goarch: 386
pkg: runtime
BenchmarkTimeBeginEndPeriod-2           10000000               144 ns/op
PASS

and (on my windows/amd64 Windows 7):

C:\>u:\test -test.v -test.run=none -test.bench=Begin
goos: windows
goarch: 386
pkg: runtime
BenchmarkTimeBeginEndPeriod     20000000               104 ns/op
PASS

Alex

alexbrainman commented 7 years ago

the patch could be just:

I am not criticizing your patch. Just a note from https://msdn.microsoft.com/en-us/library/windows/desktop/dd757626(v=vs.85).aspx:

" ... You must match each call to timeBeginPeriod with a call to timeEndPeriod, specifying the same minimum resolution in both calls. An application can make multiple timeBeginPeriod calls as long as each call is matched with a call to timeEndPeriod. ..."

Alex

aclements commented 7 years ago

Could someone give https://go-review.googlesource.com/38403 a try? It implements the idea of turning down the timer resolution when the Go process is idle.

gopherbot commented 7 years ago

CL https://golang.org/cl/38403 mentions this issue.