golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123k stars 17.54k forks source link

time: time.Sleep and time.NewTimer: Fix duration parameter error #17696

Open johnrs opened 7 years ago

johnrs commented 7 years ago

What version of Go are you using (go version)?

go version go1.7.3 windows/amd64

What operating system and processor architecture are you using (go env)?

set GOARCH=amd64 set GOHOSTARCH=amd64 set GOHOSTOS=windows set GOOS=windows

What did you do?

I wrote a test which is basically a loop consisting of two things: Some busy work for about 500us and a call to time.Sleep(1 * time.Millisecond).

Here's a link to the test program: https://play.golang.org/p/VYGvIbTPo3

It can't run on the playground, so you will have to grab it and run it yourself.

What did you expect to see?

  work(~500us) + time.Sleep(1ms): 1.500000ms    ... or slightly more
  work(~500us) + time.After(1ms): 1.500000ms    ... or slightly more

What did you see instead?

  work(~500us) + time.Sleep(1ms): 1.030103ms
  work(~500us) + time.After(1ms): 1.110111ms

Discussion

I believe that there are errors in the doc's for time.Sleep and time.NewTimer. They both make similar statements regarding the duration.

Sleep pauses the current goroutine for at least the duration d.

NewTimer creates a new Timer that will send the current time on its channel after at least duration d.

If these are correct, then the test loops should take ~1.5ms or longer per pass. But they don't.

I believe that at least the duration d was an attempt to convey that overhead might result in a longer duration than requested. But there are two other factors which might result in a shortened duration, sometime MUCH shorter. These factors are the resolution and phase of the time source used for these functions.

In Windows, the resolution is normally 1ms for Go programs. I'll refer to this value as r, short for resolution. I'm not sure what it is on other platforms. If you run the test program it will measure and display it. Please let me know what you find.

I believe that in general the duration can be as short as d-r (rather than d). How short simply depends on what the phase of the time source when you use it. Thus if you ask for a 100ms delay you will end up with something between 99 and 100 (plus overhead). Thus as you use smaller values of d (approaching r) the problem gets worse. By itself, this isn't too bad. But now the other shoe drops.

In the specific case that you ask for a delay of r (or anything between 1 and r), then the delay can be anything between 0 and r. Actually, for a reason I don't quite understand, the lowest delay I've actually seen is about 50us, not 0. Even so, that is 20 times less than the 1ms requested. This was drastic enough to break some of my code. I changed d from 1ms to 2ms. The resulting delay is between 1 and 2ms (plus overhead).

Can this cause a problem? Generally, in a loop only the first pass is affected, because the loop gets in phase with the timing source. So this might or might not be bad for you. If it is, put a sleep 1ms call just before the loop.

But imagine that you are using time.After to set up a timeout, but it occurs too quickly. Perhaps 20 times too quickly. This can cause intermittent, non-deterministic, bogus timeouts, even on a single-threaded program. Ouch! This is what bit me.

Another gotcha: Using d of 10ms now, but Go Windows switches r from 1ms to 15.6ms. Instead of a delay of 9 - 10ms you now get 0 - 15.6ms.

Possible Solutions

The really bad case when d is between 1 and r could be detected causing d to be changed to r+1. Tacky, in my opinion. Alternatively, vet could look for this case.

Most programming languages I've worked in allow this to happen, but they don't document it incorrectly. This seems like the best solution to me. Simply fix the doc's to reflect reality.

Perhaps a paragraph at the top describing what the "resolution of the time source" means, what it is for some typical platforms, and an example of a simple routine to determine what yours is. Maybe even a system call to retrieve it.

Then, change the current time.Sleep description from

Sleep pauses the current goroutine for at least the duration d. A negative or zero duration causes Sleep to return immediately.

to something like

Sleep pauses the current goroutine for at least the duration d - 1. A duration less than or equal to the time source resolution can cause Sleep to return quickly. A negative or zero duration causes Sleep to return immediately.

And similar for time.NewTimer's description.

johnrs commented 7 years ago

I agree with you analysis. But you didn't address what the doc's say: "Sleep pauses the current goroutine for at least the duration d." Both Rob and Russ (if memory serves) have confirmed that the doc's are correct. Hence the code must be wrong.

The fix is easy. Also, I feel that it makes Go code safer and more portable. So why the resistance to fixing the code?

ianlancetaylor commented 7 years ago

What is the fix? I've seen lots of discussion but I don't recall seeing a fix.

My results running https://play.golang.org/p/6CMV59o7QF GNU/Linux (these results vary quite a bit with each run):

    time.Now() resolution:     0.0001ms
    time.Sleep(1) resolution:  0.0005ms
    time.After(1) resolution:  0.0010ms
    Work runtime (~0.0001ms):  0.0002ms

     Sync Test         0          1      0.0004ms   0.0005ms
    ------------   --------   --------   --------   --------
       Sleep       0.0000ms   0.0003ms   0.0309ms   0.0300ms
    Work + Sleep   0.0003ms   0.0010ms   0.0257ms   0.0282ms
       After       0.0021ms   0.0016ms   0.0102ms   0.0136ms
    Work + After   0.0035ms   0.0028ms   0.0129ms   0.0151ms
alexbrainman commented 7 years ago

But you didn't address what the doc's say: "Sleep pauses the current goroutine for at least the duration d."

I think the doc is fine. This is working as documented, if you accept your computer's clock for what it is. Do you propose we change the documentation? What do you propose to say instead?

The fix is easy.

Please tell us what you have in mind. Thank you.

Alex

johnrs commented 7 years ago

Hi Ian.

What is the fix? I've seen lots of discussion but I don't recall seeing a fix.

Add 1 tick of the system clock to the Sleep/After period. Give me a day or two to look over the Go code so I can make a more specific recommendation.

My results ...

Thanks! As I feared, it didn't scale well down to 1us. I made a some changes which should help, but I don't know if the results will still be too noisy. Please let me know!

https://play.golang.org/p/hpurQoEyFT

johnrs commented 7 years ago

Hi Alex.

doc> "Sleep pauses the current goroutine for at least the duration d."

This is working as documented, if you accept your computer's clock for what it is.

When I Sleep(900us) I expect it to pause for at least 900us. It doesn't, as long as you accept that 566us, 413us, 67us, ... are less than 900us.

"if you accept your computer's clock for what it is".

This is something for the runtime/Sleep to deal with, not the user.

Do you propose we change the documentation? What do you propose to say instead?

I have already said that I propose fixing the code. But a more accurate statement of it's current behavior would be something like: "Sleep pauses the current goroutine for about the duration d, except at the system tick boundary where it can be much less."

Please tell us what you have in mind. Thank you.

Add 1 tick of the system clock to the Sleep/After period. Give me a day or two to look over the Go code so I can make a more specific recommendation.

ianlancetaylor commented 7 years ago

I suppose I'm willing to sleep for an additional tick if 1) we have a reliable way to determining how long a tick is; 2) more importantly, we understand why it is necessary. Right now I have no idea why it is necessary. The code looks correct to me. If there is a problem, I suspect it is though to different clocks being out of sync. Measurements of sleep duration that use a different clock, such as the media timer, are interesting but it's hard to know what to make of them for very short durations if we don't know how they are synchronized.

Have you tried your programs on Go tip? The Windows time code has changed, and in particular the way we calculate time.Now.Sub has changed to use monotonic time that uses the same clock used for time.Sleep (https://github.com/golang/proposal/blob/master/design/12914-monotonic.md).

Your new program still does not produce consistent results on GNU/Linux, but here is sample output using current tip.

    time.Now() resolution:        0.04us
    time.Sleep(1) resolution:     6.17us
    time.After(1) resolution:     7.01us
    Work runtime (~   3.08us):    3.02us

     Sync Test         0           1          4.94us      7.40us
    ------------   ---------   ---------   ---------   ---------
       Sleep          0.00us      6.93us     23.31us     45.68us
    Sleep + Work      3.05us     12.93us     28.12us     56.21us
       After         10.45us      8.51us     26.95us     53.15us
    After + Work     11.97us     12.67us     31.41us     56.71us
johnrs commented 7 years ago

I'm sorry that the reworked test program didn't do better on Linux. The 1us tick makes small variations in both the code overhead and in the runtime environment hard to factor out without really getting down and dirty. But the good side is that even if the "round down" instead of "round up" problem exists on it too, it wouldn't matter very much since the max error would be just 1us. That's a THOUSAND TIMES BETTER than on Windows, with it's 1ms tick. :)

1) we have a reliable way to determining how long a tick is;

I can think of a few ways of determining the tick rate in Windows. The most direct and authoritative is to just ask Windows. It will tell you. It would also be faster than any measurement method (which is going to require a full tick interval, or more, minimum). Thus I would recommend just asking Windows.

Here's a way which doesn't require you even know how long a tick is. I mention it just for completeness, but I don't recommend it and I doubt that you would like it. You could do a separate 1 tick at either the beginning or the end of the Sleep() "phase". Let's say you do it at the end. You'd need a flag to tell you which of the two passes you were in - main or 1 tick. The extra tick at the end would indeed last a full tick, since it would be in phase with, but just after, the system tick. To recoup: It needs a flag bit and it means additional overhead to do the extra tick sleep. Yuck - but you don't have to know how long a tick is.

2) more importantly, we understand why it is necessary.

I'm going to assume a 1ms system tick for this discussion. If you are doing a 100ms or greater sleep it doesn't really matter. If you are doing a 1ms or less sleep then it matters a lot. Right now such a sleep can take anywhere from close to 0 (I've seen as low as 40us but it might be less) to 1ms. Assuming you chose 1ms for a reason, having the sleep take up to ~25 times less at times might cause a problem.

It bit me that way once, and I was already aware of the problem (but forgot about it while writing some code)! I used a 1ms sleep as a timeout on a serial communications link to an old (but expensive) piece of test equipment. It was designed to talk to a printer and it put out a very steady character stream while sending data. The only way I could tell when it was finished dumping data was a timeout. My 1ms timeout was over 10 character times, so I figured that it was plenty safe. But not so. Every so often it would timeout in the middle of a message. To make matters worse, debugging it was no fun since the problem was non-deterministic and infrequent. Yuck!

OK, that's a real example. Here are some other concerns. In Go's runtime I once counted about 7 usleep's which were 1ms or shorter. Has anyone analyzed these to see what would happen if they took much shorter or longer than expected? How about porting code. Do I have to go through every import, even Go's standard library, looking for such problems? How bad are the risks if I don't?

For a while the problem was 15.6 times worse. When Go switched the default clock to the slowest setting (from 1ms to 15.6ms) in v1.6 code was doing a 10ms wait which was previously taking 9-10ms was suddenly seeing anywhere from ~40us to 15.6ms. That broke some monitoring routines I had in a program. I wonder how much other stuff it affected?

Neither "round down" (as it is) or "round up" is a perfect solution. There is no perfect solution. So let's look at the two possibilities we do have. The worst case is at the 1ms boundary.

Round Down: ~40us - 1ms 25:1 range (as it is now) Round Up: 1ms - 2ms 2:1 range (what doc's say, and I recommend)

I believe that the safer choice is the one with the vastly smaller error range.

If a programmer knowledgeable about such matters knows that a language "rounds down" he can code around it by simply sleeping for 1 extra tick. If he knows where the boundary is, he can skip the extra tick when above the boundary (this hurts his code portability). But this is a trap for most programmers, as reading this issue illustrates. But the real killer is that the doc's clearly describe "round up" behavior. So the trusting programmer "knows" that he is safe when in fact he isn't. Ouch!

All along - in this issue and on 3 prior occasions - I've said that I'd prefer the code be changed, but if not then certainly the doc's should be. I was authoritatively informed that the doc's were correct as is. So that leaves fixing the code (sounds good to me!).

Summary: It would be safer, more compatible/portable, and match the doc's.

The code looks correct to me.

I've looked at the code a few times previously, but not recently. There is no attempt to implemented "round up" that that I could find. Please tell me where you see it.

I believe that if you read Alex's 10 step analysis posted 2 days ago you will see that it supports what I claim. Forget about what time Now sees - that's just a measurement issue which complicates demonstrating the problem. Look at the actual time, starting with step #5. It's 10.75ms. In steps #6 - 8 the Sleep(1ms) takes place. Step 9 shows that when the Sleep() ends at 11ms. So the Sleep(1ms) actually took only 0.25ms.

If there is a problem, I suspect it is though to different clocks being out of sync.

There's only one clock that determines how long the Sleep() lasts. Other clocks don't come into play until you try to measure what is going on. But just to address the issue of clock synchronization: The 1ms Windows clocks all tick at the same time, or damn close (compared to 1ms) to it. Yes, Alex and I even ran some tests on this, around Nov 14.

But even if they were out of phase (by less than 1 tick, right?) then doing something 100 times in a loop reduces that maximum 1 tick measurement error to 1% of the total time observed. Meanwhile the correct result is 100% more!

Before you say "system delays could affect the result" - Yes! They would INCREASE the observed time, making it falsely look more correct, thus hiding the problem! The results are generally within a percent or two of the incorrect value, thus nowhere close to the correct value, I think it's extremely save to say that the measurements are valid.

Plus you have the evidence I sent you based on measurements using the ~550ns Media Timer. It's over 1000 times the resolution of the 1ms tick, and it shows the same results.

Have you tried your programs on Go tip?

No, but I have tried it on v1.2, v1.3, v1.4, v1.5, v1.6, v1.7, and various sub-versions and betas - all showing the same problem.

... the way we calculate time.Now.Sub has changed ...

But this should not affect the amount of time spent sleeping, else Sleep() would not be implemented correctly, right?

I hope this clears up some of the confusion. I'm happy to answer any additional questions. I appreciate you help getting this resolved.

ianlancetaylor commented 7 years ago

Thanks. So here is my current understanding of the issue.

So I think I may finally understand what you are saying.

Of course, there is some time required to put a goroutine to sleep and there is some time required to wake a goroutine up. This is basically the time it takes to run goparkunlock and goready and wakep and notewakeup and the futexwakeup system call and the return from the futexsleep system call and acquirep and findrunnable and execute and gogo (on Windows replace futexwakeup and futexsleep with SetEvent and WaitForSingleObject). Let's call the total time from entering time.Sleep to going to sleep to waking up to returning from time.Sleep M.

So to ensure that the goroutine that calls time.Sleep sees a sleep of at least D, we want to know that A + M >= D. We know that A > D - K, so we are OK if M > K.

On GNU/Linux a system call alone takes at least 50ns. Given that on GNU/Linux K seems to be around 100 or so, and that on machines where a system call is faster, it's likely that K is smaller, I think we can reasonably assume that M > K is always true.

But on Windows, and possibly other systems, K is much larger and M may be smaller.

My understanding of your suggestion is that time.Sleep should schedule a wakeup not for T + D, but for T + D + K. That will ensure that A > D in all cases (in fact, it will ensure that A > D + M). This requires determining K, which I don't know how to do on GNU/Linux or on other Unix systems. But perhaps in practice it's OK to only add K on Windows.

Another approach we could take is to decide that we don't care about K when sleeping for values over, say, 1e6ns == 1ms, and to implement time.Sleep for values less than 1ms by making a raw system call to the system's sleep routine (nanosleep or WaitForSingleObject or whatever). For compatibility we should perhaps make a system call and then call runtime.Gosched so that any call to time.Sleep allows other goroutines to run.

johnrs commented 7 years ago

You're getting it. :) I've commented on a few items. The others I agree with fully.

on Windows K seems to be fairly large, over 500 nanoseconds, maybe; I'm not really sure

Windows provides a few different clocks and timers. Some of them depend on which version of Windows you are running. Characteristics can also vary based on the hardware and/or BIOS. The clock most commonly used, including by Go, is adjustable from 0.5ms to 15.6ms, in steps. By far, the most common settings are 15.6ms (default) and 1ms. For some reason the 0.5ms setting isn't allowed by the normal means, but it can be done. I usually see it referred to as the "system clock".

Changing the system clock affects some things in the kernel, including how often the thread scheduler runs. Thus setting it to 1ms gives you more dynamic scheduling and better timing resolution. But it results in some additional CPU overhead (so little that I don't notice it at all) and some additional power usage (important for battery operation).

Windows keeps track of what setting each process wants. It uses the same clock for all processes, thus it chooses the fastest requested speed at any given time. Some programs (ex: Chrome and Firefox) dynamically change their speed request. The drop it to slow when idling, but speed it up when they have lots of work to do (to get more dynamic thread scheduling I suppose).

When Go starts it requests the 1ms rate. Thus the tick rate doesn't change (unless another process requests the 0.5ms rate - which very few, if any, do). Originally, if memory serves, Go programs were able to request dropping to the 15.6ms rate. But recently I've noticed that this doesn't seem to work any more. I have seen that the user making this change could sometimes upset Go, so perhaps it is being blocked on purpose. I don't know. If so, I have to wonder why Go doesn't provide the call to switch as a std lib function. That way Go would know when a change occurred and it could adjust. But I digress....

For test purposes there are plenty of small program which allow you to monitor and/or change the system clock rate. My favorite is TimerTool (https://github.com/tebjan/TimerTool). It allows you to set all of the legal rates (not just 1ms and 15.6ms), including the elusive 0.5ms.

Another common clock in most (anything used currently) versions of Windows is the Media Timer. It's monotonic, not time-of-day, and ticks at about 550ns (on my machine). Unfortunately it's a bit awkward to use, resulting in more overhead than the normal clock. But if you need more precise timing it is 1000 times better than the normal clock. I only use it when needed. I don't believe that Go uses this clock at all, other than in some 3rd party libraries.

on GNU/Linux 'K' can in principle be 1 nanosecond but in practice seems more like 100

Linux also presents various clocks/timers. I don't know which one Go uses. I believe that gettimeofday is one of the most popular for general use. The format uses 1us resolution. I've seen people say that it steps with various resolutions (from 1us up to ???), depending on hardware, type of Linux, etc. The [flawed] test results from your machine showed it in the 1us neighborhood, but Sleep(1) and After(1) seemed to take a few us.

at any point in the program we are between system clock tick T and T+1

Yes. A good analogy might be mile markers on a highway. If someone drops you from a helicopter and tells you to walk 1 mile you face a dilemma: Should you walk to the next mile marker (less than a mile) or to the 2 mile marker (over a mile). If you need to walk 100 miles, then you can choose either the 100th or the 101th mile marker and be correct within 1%.

when we ask for the time we get T, but in some sense the actual time is T + S, where S < K

Yes! And you don't have any idea what S is.

we know that S < K and S1 < K but it is possible that 'S > S1'

Well... If we are talking about a 1ms K, fine. Indeed, in this case S1 is probably VERY small since it occurs immediately after a tick. If we are talking about a 1us K then it is possible that S1 could be larger than K. But this would make A even larger. This is fine since we promise "at least".

of course we actually want A >= D, so we may be short by up to K

Bingo! Hence adding 1 K guarantees that we meet the promise. :)

So I think I may finally understand what you are saying.

Yes, you do. You nailed it. Sorry I couldn't express it better. :)

Of course, there is some time required to put a goroutine to sleep ...

S starts when time.Sleep is called which is before this, however. I agree with the rest.

But on Windows, and possibly other systems, K is much larger and M may be smaller.

Yes!

My understanding of your suggestion is that time.Sleep should schedule a wakeup not for T + D, but for T + D + K. That will ensure that A > D in all cases (in fact, it will ensure that A > D + M).

Yes! The alternative would be to "stretch" the sleep by an additional sleep for 1. This will result in an additional sleep of K, regardless of what K is. The leading S doesn't matter since the goroutine is already asleep. The trailing S replaces the trailing S from the initial sleep. As I said previously, I do no care for this method due the the additional overhead, but I felt it was worth mentioning just to show that there was a way to do it without knowing K.

This requires determining K, which I don't know how to do on GNU/Linux or on other Unix systems.

As long as they are fast (are you sure?), OK by me. I think that you can determine a upper limit for K. Pseudo assembly code:

X = time     :: just the least-significant word of "time"L
LOOP:
Y = time
if (X == Y) goto LOOP
K = Y - X

Do this a few times and use the smallest K found.

K might be smaller that what we observe but it can't be larger. Coded in assembly I would think this would give an exact answer for K > ~10ns and an upper bound (of <= ~10ns) for smaller K's. It depends on memory access times, I guess. But even if K is < ~10ns, does it really matter? The time.Sleep overhead (including goroutine restart) is probably lots more.

Another approach we could take ...

I'd love to get 1us resolution on Windows! Although I can use the Media Timer to measure, I can't use it to sleep. There is a very well known Perl library (Time::HiRes) which accomplishes something like 1us timing functions and it's been around a long time. I haven't looked at the code to see how.

... implement time.Sleep for values less than 1ms by ...

Is there any downside to just using the faster method all the time? Overhead?

Side Note: I'm just reading that on some Linux systems, nanosleep <= 2ms simply do a busy wait. This avoids the thread rescheduling time, thus giving a much more accurate sleep interval. Go's goroutines are faster but there still has to be a limit that a busy wait would be better. 1us???

I think that I'm finally seeing the light at the end of the tunnel (regarding this issue). I just hope that it isn't an oncoming locomotive! :)

It sounds like we need to find out a few things.

P.S.

ianlancetaylor commented 7 years ago

A couple of quick notes:

Indeed, in this case S1 is probably VERY small since it occurs immediately after a tick.

Likely but not always guaranteed. If there are no current timers, and a goroutine calls time.Sleep, then it will wake up the timer goroutine to let it know about the new timer. The timer goroutine will check call nanotime to check the current time. Since the timer goroutine was woken up manually, not due to a timer tick, S1 could be anything. For a very short argument to time.Sleep, given the goroutine wakeup and scheduling that occurred, the current time might still be appear to be after the desired sleep time. Not sure if this actually matters, though.

Linux also presents various clocks/timers. I don't know which one Go uses.

For purposes of time.Sleep Go uses clock_gettime(CLOCK_MONOTONIC).

Is there any downside to just using the faster method all the time? Overhead?

If we are going to sleep for a noticeable period of time we want to immediately put the goroutine to sleep and send the operating system thread off to pick up a new goroutine. Calling a system call will have that effect eventually, but it will be delayed; there is no purpose to the delay if we know we are going to sleep.

alexbrainman commented 7 years ago

My understanding of your suggestion is that time.Sleep should schedule a wakeup not for T + D, but for T + D + K.

K is 15ms on the tip (https://go-review.googlesource.com/38403 for details). So this proposal will make time.Sleep(1ns) sleep between 15ms and 30ms. I don't think this is acceptable. It is hard to sleep for short periods of times as is on Windows. This will make it even harder.

The alternative would be to "stretch" the sleep by an additional sleep for 1. This will result in an additional sleep of K, regardless of what K is.

Same problem as above.

Another approach we could take is to decide that we don't care about K when sleeping for values over, say, 1e6ns == 1ms, and to implement time.Sleep for values less than 1ms by making a raw system call to the system's sleep routine (nanosleep or WaitForSingleObject or whatever).

Perhaps I do not understand your proposal, but how is this different from our current approach? Sure, we employ separate goroutine to sleep, but the goroutine calls nanosleep or WaitForSingleObject anyway.

Alex

ianlancetaylor commented 7 years ago

Perhaps I do not understand your proposal, but how is this different from our current approach? Sure, we employ separate goroutine to sleep, but the goroutine calls nanosleep or WaitForSingleObject anyway.

The difference is that the timer goroutine can be woken up for many different reasons, and it will decide whether a goroutine's sleep is done by checking the current time, thus possibly waking up that goroutine very slightly too early.

If the sleeping goroutine calls the system call directly, it is sure to sleep for as long as the system considers to be an appropriate time.

alexbrainman commented 7 years ago

The difference is that the timer goroutine can be woken up for many different reasons, and it will decide whether a goroutine's sleep is done by checking the current time, thus possibly waking up that goroutine very slightly too early.

Thank you for explaining.

If the sleeping goroutine calls the system call directly, it is sure to sleep for as long as the system considers to be an appropriate time.

I have very little faith in Windows system calls that provide sleeping functionality. But, I suppose, we could just shift the blame. :-)

Alex

johnrs commented 7 years ago

K is 15ms on the tip (https://go-review.googlesource.com/38403 for details).

I was unaware of 38403. I just read it. Very interesting! I like it very much!

But wouldn't K be at 1ms if there is a goroutine active?

So this proposal will make time.Sleep(1ns) sleep between 15ms and 30ms. I don't think this is acceptable.

I see it as better than the alternative: sleep between 0 and 15ms. This is more problematic and is contrary to what the doc says.

I have very little faith in Windows system calls that provide sleeping functionality.

I know that availability over all versions of Windows is a problem. Is this what you were referring to or something else?

johnrs commented 7 years ago

Two notes...

1) We've been talking mainly about time.Sleep, but time.After has the same issue. I believe it might be based on time.Ticker (not sure). If so, it's only the 1st tick needs the extra K. Succeeding ticks are in sync with the system clock so they won't be short.

2) It occurs to me that if one of the ideas Ian presented. If a K is only added if the requested duration is <= K this would eliminate what I think are the most unsafe cases. So it mitigates the worst part of the problem, leaving the rest as is. More specifically, this would not cause the 15 to 30ms situation which Alex is concerned about. So it might make a good compromise solution.

alexbrainman commented 7 years ago

But wouldn't K be at 1ms if there is a goroutine active?

Define "a goroutine active". And no, we cannot assume that K is 1ms, because sometimes it will change to 15ms.

I see it as better than the alternative: sleep between 0 and 15ms.

Why? Why do you think time.Sleep(1ns) sleeping between 15ms and 30ms is preferable to it sleeping between 0 and 15ms?

I know that availability over all versions of Windows is a problem. Is this what you were referring to or something else?

I am referring to nothing in particular. I am just saying that it is very difficult to sleep for short period of time on Windows. I tried to find solution in the past. Perhaps it is different now.

It occurs to me that if one of the ideas Ian presented. If a K is only added if the requested duration is <= K this would eliminate what I think are the most unsafe cases. So it mitigates the worst part of the problem, leaving the rest as is. More specifically, this would not cause the 15 to 30ms situation which Alex is concerned about. So it might make a good compromise solution.

I don't see the difference, but I did not think about this properly. Feel free to send the change, if it is simple enough. We'll see.

Alex

johnrs commented 7 years ago

Define "a goroutine active". And no, we cannot assume that K is 1ms, ...

Per #38403: "it only reduces the timer resolution when the Go process is entirely idle"

Why do you think time.Sleep(1ns) sleeping between 15ms and 30ms is preferable ...

"I believe that the safer choice is the one with the vastly smaller error range." Search the thread for "safe" for more details from my previous posts.

prasannavl commented 7 years ago

There seems to be a lot on the discussion here. Just glanced over it - so I apologize in advance if I repeat any points already made.

My 2 cents:

@alexbrainman,

So this proposal will make time.Sleep(1ns) sleep between 15ms and 30ms. I don't think this is acceptable. It is hard to sleep for short periods of times as is on Windows. This will make it even harder.

In my opinion, this statement couldn't be more wrong. Firstly, I don't think it's just acceptable, but infact is what we should strive to do on the Windows platform. It's hard to sleep for short periods on Windows - true - but that doesn't mean you break out of the consistency of the platform. It's completely acceptable in Windows that 15ms is the highest general case resolution - and for good reason. If a program requires higher resolution, the programmer asks for it explicitly, and it's a well known paradigm.

It's as simple as making one call to the multimedia timers manually, should the application decide. But the language runtime forcing that on programs is far more harmful. The goal should be to remove the reliance of the runtime on multimedia timers entirely, not do the opposite.

If needed, I think compatibility of certain requirements can be dropped for older versions of Windows, that have inconsistencies in APIs like say, QueryPerformanceTimer depending on the underlying hardware. But in the post Win8.1+ world, they are consistent and do as documented.

Time is hard. Because of it, you can't just go ask the OS to shoot itself in foot, just because it offers that flexibility for programs that really need it.

More context: https://github.com/golang/go/issues/8687#issuecomment-320478824 https://go-review.googlesource.com/c/38403#message-96d2d5bb23216c271f8f8c7bd7d471e0ce999fae

johnrs commented 7 years ago

@prasannavl

I think that most would agree that 1ms timing resolution is more desirable than 15ms resolution (except when trying to save power). Which to use has changed a few times in Go. The most current solution coming in v1.9 will hopefully be optimum for the majority of uses. Allowing for extreme cases is something which is still on the table, as per https://github.com/golang/go/issues/20937. I plan to open an issue about it after v1.9 is released.

This issue is about the time.Sleep's (and other timers) documentation saying:

Sleep pauses the current goroutine for at least the duration d.

As far as I can tell, Go has never performed this way (including the upcoming (v1.9). The situation is this. Let's say that you are running with the system timer set to 1ms. If you do a time.Sleep(1ms) there are two possible results (ignoring the statistically unlikely case of exactly a 1ms sleep). You will either sleep for less than 1ms or more than 1ms (generally between 1ms and 2ms).

The documentation says it should be "at least" 1ms. But testing shows that it is generally "at most" 1ms. Hence the issue. I've seen time.Sleep(1ms) take anywhere from as little as 10us to 1ms.

A more accurate description of the current behavior would be something like:

Sleep pauses the current gouroutine for least the duration d, minus 1 tick of the system clock.

The fix can be either to change the documentation to match the code, or to change the code to match the documentation, but leaving it as is is simply wrong. Personally, I prefer changing the code to match the documentation. I feel that it is the safer, most useful solution. In other words, I think that time.Sleep(1ms) resulting in a 10us sleep is much worse than a a 2ms sleep.

The QueryPerformanceTimer is interesting for high precision (550ns resolution on my machine) interval measurement but it comes at the cost of higher overhead, so I wouldn't want it to become the default but perhaps an option instead. Also, I don't think that it would help in the case of time.sleep() since it doesn't provide a "wait" feature.

Discussion here dropped off since there were other changes being made to time (wall clock vs monotonic) with the hope that perhaps the issue would be resolved as a side effect. I don't believe that it has, hence I plan to ignite the discussion again after the v1.9 release.

prasannavl commented 7 years ago

I think that most would agree that 1ms timing resolution is more desirable than 15ms resolution (except when trying to save power).

I'd say that's debatable. I'd like to think most wouldn't agree if they understand the costs. Most general application have really no need for the higher resolution. The fact that, to this date, there is no way baked into the .NET ecosystem (a ecosystem built for Windows) to do this without resorting to P/Invoke and calling the mm APIs stand on that very rationale as well. Only the application that really really need it, do so - and they can still do so with a very simple call.

But other than that, I am on the same page with you on everything else. I also suppose while a lot of my points had contextual relevance, it isn't directly of immediate relevance to this issue. I shall wait for the discussion to ignite again on it, as you said. :)

johnrs commented 7 years ago

(except when trying to save power).

I'd say that's debatable.

I did leave out one other consideration: It also allows programs (your and others) to run a slight bit faster due to the reduction in overhead. I don't think it's a major factor, but seems to matter to some. Did you have any other costs in mind?

OK. Looking forward to your participation. :)

alexbrainman commented 7 years ago

Why do you think time.Sleep(1ns) sleeping between 15ms and 30ms is preferable ...

"I believe that the safer choice is the one with the vastly smaller error range." Search the thread for "safe" for more details from my previous posts.

I don't understand your explanation, sorry.

The goal should be to remove the reliance of the runtime on multimedia timers entirely, ...

I won't argue with that.

Alex

prasannavl commented 7 years ago

It also allows programs (your and others) to run a slight bit faster due to the reduction in overhead. I don't think it's a major factor, but seems to matter to some.

@johnrs

I think you have an incorrect understanding of the windows scheduler? When you increase the resolution, programs don't run faster - they run slower, with more overhead.

Back to some fundamentals - when you increase the res to 1ms, what happens is the kernel interrupts every 1ms instead of 15ms, which means there's a lot more context switches, and more book-keeping to do. So, it increases CPU usage, and increase overhead. What appears faster is that thread wake ups are happening sooner than the 15ms - this only affects programs that are sleeping, and or waiting using one of WaitForSingleObject, Multiple etc..

While I generally see a lot of complaints when directly translating UNIX philosophies on to Windows - it's simple because you're solving it against the platform's way. Windows kernel scheduler is actually far more intelligent than most realize. Unless, there's a lot of contention for threads, and it sees a compute heavy thread, it sets the affinity of the thread to a specific CPU (on multi-cores), and pins it for multiple quantum slice.

So, what this means is, the scheduler completely gets out of the way, and your program gets all the CPU it needs. Now with 1ms, you're basically forcing the kernel to do more book-keeping. So, your programs actually get slower.

You can very easily test this. Just run a simple C (or even a C#) program, and let it run a heavy computation. It will complete faster on 15ms than when running on a higher res. I don't say "Go" program, because I don't yet know the internals of the Go scheduler to understand where it interferes with this, in order to workaround it.

The only place where it could potentially be a little faster (but even here, in my naive tests, the kernel seems to work it's magic), is when you have a lot of "active" threads contending for the CPU constantly, and your program is not in the foreground. (This is a decade old optimization for foreground programs to get better quantum slice priority).

Here's a simple C# program to illustrate the same:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Runtime.InteropServices;

namespace dtest
{
    class Program
    {
        static List<TimeSpan> s_results = new List<TimeSpan>();
        static Stopwatch s_timer = new Stopwatch();
        static void Main(string[] args)
        {
            NtQueryTimerResolution(out int minRes, out int maxRes, out int currentRes);
            Console.WriteLine("Resolutions - current: {0}, {1}, {2}", currentRes, minRes, maxRes);
            NtSetTimerResolution(maxRes, true, out currentRes);
            Console.WriteLine("Setting to highest - current: {0}, {1}, {2}", currentRes, minRes, maxRes);            
            for (int i = 0; i < 20; i++)
                Run();
            NtSetTimerResolution(maxRes, false, out currentRes);                
            Console.WriteLine("Setting back to normal - current: {0}, {1}, {2}", currentRes, minRes, maxRes);
            for (int i = 0; i < 20; i++)
                Run();
            s_results.ForEach(x => Console.WriteLine(x.TotalMilliseconds / 1000 * 1000 + "ns"));
        }

        static void Run()
        {
            s_timer.Reset();
            s_timer.Start();
            var i = 0;
            for (; i < int.MaxValue; i++) { }
            s_timer.Stop();
            s_results.Add(s_timer.Elapsed);
        }

        [DllImport("ntdll.dll", SetLastError = true)]
        static extern int NtSetTimerResolution(int DesiredResolution, bool SetResolution, out int CurrentResolution);

        [DllImport("ntdll.dll", SetLastError = true)]
        static extern int NtQueryTimerResolution(out int MinimumResolution, out int MaximumResolution, out int CurrentResolution);
    }
}

It goes a bit further and sets the timer res to 0.5ms on systems that support it for the sake of the argument. According to my tests:

Resolutions - current: 156251, 156250, 5000
Setting to highest - current: 5003, 156250, 5000
Setting back to normal - current: 156251, 156250, 5000
748.0012
750.8139
750.4076
750.0441
750.243
738.9563
750.5714
737.8166
750.0437
750.4029
751.0132
750.1848
749.9552
749.7858
750.148
749.9881
750.4029
750.0625
749.9534
750.1053
721.1744
722.268
722.3988
724.3536
721.4824
723.0694
723.7977
724.2121
721.3985
722.0977
722.9411
722.5806
721.2121
722.7478
723.8593
725.0409
720.9901
720.9713
722.4745
721.0312

When resolution is 15ms vs 0.5ms - the former is about 30ns faster to run one full loop from 0 to int.MaxValue

Now, the same program when with almost no thread contention (Previously there were a bunch of processes in the background running in like Chrome, ProcessHacker, VSCode etc. - which is why the difference was much greater. Kernel's doing a lot more book-keeping. Note though: they are still in background. Foreground process is always given higher quantum slices.)

Resolutions - current: 156251, 156250, 5000
Setting to highest - current: 5003, 156250, 5000
Setting back to normal - current: 156251, 156250, 5000
739.4905ns
721.7347ns
722.6255ns
730.5602ns
722.8594ns
731.3017ns
723.2909ns
723.2439ns
723.3106ns
723.3431ns
722.9137ns
723.2832ns
722.9886ns
729.5141ns
721.5281ns
721.5093ns
734.1097ns
750.5821ns
732.7545ns
724.7744ns
721.0487ns
721.301ns
723.9683ns
723.4316ns
724.4439ns
726.2302ns
722.6499ns
723.2126ns
725.6815ns
722.0058ns
720.7848ns
726.6467ns
720.137ns
720.4915ns
720.2242ns
720.8336ns
721.6692ns
722.4454ns
720.8203ns
722.158ns

Here the difference is much smaller. Though, lower resolution is still faster with less overhead to deal with. Feel free to run the tests on difference scenarios to understand it's implications. Windows uses a timer resolution of 15ms, because it's a good compromise between stealing programs time, and resolution. It's not very difficult to auto tune this resolution in the kernel, and emulate it like UNIX as much as the interrupt timer resolution allows (typically 0.5-1ms). But it's trade-off that has to be understood explicitly by each program before it does this.

I like to believe that's possibly a reason Microsoft simply chose not to do it, and to provide make 15ms the default resolution and program that knows that it needs more can explicitly go ask for it - which is also why I think it's far better to stick to the consistent model that the platform recommends than letting every random Go programs take extra interrupts without knowing the consequences. The unfortunate thing here, that saddens me however, is that the runtime itself makes these bad decisions.

johnrs commented 7 years ago

Hi Alex.

I don't understand your explanation, sorry.

Sorry I'm not being clear. Perhaps it would help if we backed up a step. Do you agree that there is a problem? If so, what solution do you prefer?

johnrs commented 7 years ago

When you increase the resolution, programs don't run faster - they run slower, with more overhead.

Yes, that's what I meant. When I referred to another consideration I meant in addition to saving power, when running at 15ms.

... however that saddens me is that the runtime itself makes these bad decisions.

Better in v1.9 than it was, I think. And can be improved as per issue #20937.

prasannavl commented 7 years ago

Yes, that's what I meant. When I referred to another consideration I meant in addition to saving power, when running at 15ms.

Ah! I mistook it the other way, and was confused why you meant it would increase the perf increasing the timers. Guess, I could have saved myself a few minutes not having to write out that naive test. Haha.

Anyways, yes. 1.9 is better on idling. 👍

johnrs commented 7 years ago

No sweat. Save the test results. They would support the case for using 15ms even when the a program isn't idling. :)

alexbrainman commented 7 years ago

Perhaps it would help if we backed up a step. Do you agree that there is a problem?

I don't agree there is a problem here. I think I already explained my view. I think that you expect your program to work in a particular way, but that is not how it works. I agree that Go documentation is not correct for some scenarios you've described. But, I submit, we should leave documentation alone, because these scenarios do not happen often enough. More documentation will just make things even more confusing.

Alex

johnrs commented 7 years ago

I do not know what your problem is.

Incorrect behavior. time.Sleep doesn't work as documented:

Sleep pauses the current goroutine for at least the duration d.

A more accurate description of the actual behavior would be something like:

Sleep pauses the current goroutine for at least the duration d, minus 1 tick of the system clock.

Note: I only make this claim for Windows. I do suspect that the problem exists on all platforms, but I do not claim that at this time.

The difference in behavior (documented vs actual) can result in problems and can make a program harder to reason about.

Severity: Depending on the code, it can cause loss of data (high priority), bogus retries (medium priority), or not matter (low priority). The symptoms are typically non-deterministic and intermittent thus making the problem hard to troubleshoot.

It caused a program of mine to fail and that's why I decided to report it. I believe it happens more frequently than people realize because it generally doesn't cause data loss,just retries, or some other fatal effect.

There was some confusion due to problems with early versions of the test program: Testing at the tick boundary, not allowing for the Windows clocks smaller than 1ms (ex: 0.976ms), and the "sleep + work" test results being hard to interpret. So let's refer to a simple, clear test result. From my post on May 26

Hit Enter for a Sleep(900us), Ctrl-C to exit: 566us 413us 67us 458us 952us 408us 901us 470us 272us 394us [etc]

I agree that Go documentation is not correct for some scenarios you've described.

I can't think of a scenario in which it is correct. Please describe one.

More documentation will just make things even more confusing.

I believe that the goal should be:

"Everything should be as simple as possible, but no simpler."
-- Albert Einstein

I believe that both Rob and Russ have said that the documentation is correct as is. So the remaining option is to change the code to match it.

I think I already explained my view.

In light of the early confusion, and to help me understand your position, could you please restate it in the context of this message?

alexbrainman commented 7 years ago

Incorrect behavior. time.Sleep doesn't work as documented:

So, you want documentation changed. Thank you for explaining.

I can't think of a scenario in which it is correct. Please describe one.

Perhaps you are correct. Perhaps it is impossible to describe one. But I consider myself a practical person. The problem you are describing does not bother me.

In light of the early confusion, and to help me understand your position, could you please restate it in the context of this message?

I am not convinced we need to change code or documentation. But don't try to convince me, I don't make decisions here.

Alex

johnrs commented 7 years ago

So, you want documentation changed.

No, that is not what I said.

The problem you are describing does not bother me.

I understand. Would it being fixed bother you?

alexbrainman commented 7 years ago

Would it being fixed bother you?

Fixes are not free.

Recently we made changes related to this (see CL 47832 and 47833). Austin had to spend his time implementing this changes. Austin could have improved something else instead. And now (after CLs 47832 and 47833) we have more code in runtime that all runtime developers have to look at and consider. And the code can break in the future.

Same with "improved" version of documentation. More complicated documentation means that every Gopher (even the ones who are not affected by this issue) must read and understand all these new details.

So the problem must be worth fixing. But personally I am not convinced.

Alex

johnrs commented 7 years ago

Fixes are not free.

Very little in life is. :)

Recently we made changes related to this (see CL 47832 and 47833).

Those were related to CL 38403 and issue 20937, not this issue.

More complicated documentation means ...

Then let me propose an alternate change to the documentation:

Sleep pauses the current goroutine for about the duration d.

This is both simpler and more correct. Win-win!

alexbrainman commented 7 years ago

This is both simpler and more correct. Win-win!

I agree this is very small change. But I still cannot make this decision. I did not design / document that API, so I cannot make that call. You have to convince others. Sorry.

Alex

prasannavl commented 7 years ago

@johnrs, just been thinking about this again - I think there's a problem with your time.Now() resolution measurement. time.Now() measurement by running it in a loop until is changes is incorrect.

You'll get different time.Now() resolution by doing this during different times depending on the how the OS schedules your threads - which also implies your estimation of dowork's time is incorrect. It can or cannot be different each time you run it, depending on when and where the OS thread gets interrupted. I think this very fragile way to test this.

johnrs commented 7 years ago

@prasannavl - You are correct, of course. But if care is taken to run the test on a system that is lightly loaded and has multiple CPUs then the chances that the test's thread will get paused are pretty small.

Dowork was an attempt to introduce a delay which didn't depend on the system clock. It was messy and the result (sleep for slightly less than 1 tick + work for half a tick) was hard to explain to people. Some assumed that the result should be the sum of the two. Work should have caused rounding to the next clock tick, for a total of 2 ticks. A result of 1 tick showed that the sleep was less than 1 tick. It can be hard to follow the logic.

Oh, and don't forget to disable turbo (speed-stepping) mode if your BIOS has it on, else it can makes the "work" loop pretty inaccurate. On my PC with only a single core running the CPU clock is about 40% faster than when all 4 cores are running.

I finally just used the QPC timer to measure the intervals. The results were clearer since they didn't need any interpretation.

In later versions I didn't loop at all and just displayed a few samples. That was enough to show if the results were "steady" or not. But I think that the Manual test is the easiest to understand. It simply times a sleep which is started at a random time (relative to the system clock's phase).

If I wanted to determine the tick resolution in a production program I'd probably use a loop (~10 passes, well under the 1ms scheduling interval) and I'd use the minimum value, not the average value. I think it's fair to say that the thread would not be paused at least once during those 10 passes. Also, Ian suggested that locking the goroutine to the thread might help.

prasannavl commented 7 years ago

I understand the premise. But just wrapping my head around to see if there's a better way to test this while dealing with such low intervals.

Work should have caused rounding to the next clock tick, for a total of 2 ticks. A result of 1 tick showed that the sleep was less than 1 tick.

My concern is the definition of what's 1 tick - and how that's measured. QPC can measure intervals with the highest possible accurately, but not how many ticks a particular program has gone through. I think the most accurate (and Microsoft recommended way) to do this will be to run xperf so we can visualize it right down to the thread quantum of the program in PerfView). But this gives just accuracy, not the precision needed, since the overheads might be way too high. The only possible way to measure that with some sense of reasonably reliability would be to write a kernel driver, which is just madness for this particular problem. I'm trying to wrap my head around it, to understand if there's any way to do a better analysis without doing that.

May be xperf and perfview might just be able to provide something indicative here. Not sure.

johnrs commented 7 years ago

The intervals for Windows aren't that low, so QPC works fine. It has nearly 2000 times the resolution of a tick on my machine. From what I read this is typical.

My concern is the definition of what's 1 tick - and how that's measured.

My scheme seems to produce repeatable, correct results for the QPC, time.Now, and time.Sleep(1) ticks. Indeed, that the QPC tick measurement works (admittedly it does double or otherwise fail from time to time) showing 550ns says something. :)

QPC can measure intervals with the highest possible accurately, but not how many ticks a particular program has gone through.

It's going to continue running even if your program is paused. Is this what you mean? For testing I don't see this as a problem since we know that running the test multiple times, the smallest result is correct. I'm in the middle of making a few improvements to my test program, including the "loop and use the minimum" idea, now. I will post it on the Play site hopefully in an hour or so. I'll let you know when I do.

Also, if needed, using keyboard input to de-synchronize the time.Sleep can be replaced by some other long running, but randomized runtime, routine.

So I think that the Windows case is well covered. Platforms with faster ticks are another matter, however. I'd like to have a test for high resolution (<= 1us) ticks. Actually, 1us isn't hard to measure and I think that my current scheme should probably work down to about 100ns, but if it gets down to 10ns then yuck!

On Linux, if the time.Now tick and the time.Sleep(1) (1 = minimum sleep time) ticks are the same, as in Windows, then something with even more resolution is needed to make measurements on time.Sleep when used in a non-synchronized mode. Ouch! Do you happen to know if they are the same? Any idea what they are?

Or I could revert to the "work + sleep" test. It's sloppy but I eliminated about half of the inaccuracy in later versions by doing an auto-calibrate. I certainly wouldn't want to use it in any production project, however!

Finally, if the ticks get below 100ns you have to ask if it's even worth worrying about any +/- 1 error. :)

johnrs commented 7 years ago

Done. https://play.golang.org/p/nH3LnBNQtj

This is what I see.

Parameter Tests

  QPC Timer tick:              549ns
  time.Now monotonic tick:     1ms
  time.Now wall clock tick:    1.0001ms
  time.Sleep(1) re QPC:        1.000487ms
  time.Sleep(1) re monotonic:  1.000057ms

Manual Test - Hit Enter for a loop of 5 Sleep(900us)'s, Ctrl-C to exit:

    1st       2nd       3rd       4th       5th         Avg (w/o 1st)
  ------    ------    ------    ------    ------        -------------
  1296us    1067us     893us     815us    1045us    ==>     955us
   763us     998us     934us     949us    1194us    ==>    1019us
  1312us     686us    1169us     963us    1026us    ==>     961us
   195us    1089us    1006us     958us     938us    ==>     998us
  1248us     951us     879us    1017us    1095us    ==>     986us  exit status 2