golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.98k stars 17.67k forks source link

time: Now is "slow" #57749

Closed pierrre closed 1 year ago

pierrre commented 1 year ago

What version of Go are you using (go version)?

$ go version
go version go1.20rc2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/pierre/.cache/go-build"
GOENV="/home/pierre/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/pierre/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/pierre/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/pierre/.gimme/versions/go1.20rc2.src"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/pierre/.gimme/versions/go1.20rc2.src/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20rc2"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1508083160=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Disclaimer: yes I know, I'm trying to do something "silly" (benchmarking time.Now()). But it has a real impact in production for me.

Context: I'm trying to measure the execution time of a function that is "fast" (below 1 microsecond).

I ran a benchmark on my function (with testing.B), so I know it's fast (below 1 microsecond). But I want to measure its execution time while it's running in production (and report it in my metrics).

So I did something simple:

start := time.Now()
// call my function
dur := time.Since(start)
// report in my metrics

Then I noticed that the measured duration was much higher than the time measured in my benchmark (around 5 microseconds, vs 1 microsecond in my benchmark).

So I did a benchmark that calls time.Now():

func BenchmarkTimeNow(b *testing.B) {
    var res time.Time
    for i := 0; i < b.N; i++ {
        res = time.Now()
    }
    benchRes = res
}
➜  timebench go test -v -bench=.
goos: linux
goarch: amd64
pkg: timebench
cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics
BenchmarkTimeNow
BenchmarkTimeNow-16       543583              2249 ns/op
PASS
ok      timebench       1.249s

(Yes I closed all other running apps)

I'm very surprised, because this time (2249ns) is very high. If I remember correctly, on my other computer it was 2 orders of magnitude lower (30ns)

I did a profiling of my benchmark. All the CPU time is spent in time.now() (yes, lower case now)

My question: how can I measure the execution time of a function, if calling time.Now() is slower than calling my function ?

Feel free to close this issue if you think it's not a problem.

What did you expect to see?

Calling time.Now() should take approximately 30ns.

This is the benchmark result on my other computer:

➜  timebench go test -v -bench=.
goos: linux
goarch: amd64
pkg: timebench
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkTimeNow
BenchmarkTimeNow-12     35953132                31.34 ns/op
PASS
ok      timebench       1.164s
➜  timebench go version
go version go1.20rc2 linux/amd64

What did you see instead?

Calling time.Now() takes 2249ns, which is slower than the execution time of the function I'm trying to measure.

ianlancetaylor commented 1 year ago

For what it's worth we already have a benchmark for time.Now in the time package. it's called BenchmarkNow.

When I run go test time -test.run=xxx -test.bench=BenchmarkNow on my amd64-linux laptop I get

BenchmarkNow-16                 44539380            25.24 ns/op

You got similar results on your "other computer."

So tell us more about the computer on which time.Now is slow.

pierrre commented 1 year ago

Here is the result:

➜  ~ go test time -test.run=xxx -test.bench=BenchmarkNow
goos: linux
goarch: amd64
pkg: time
cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics
BenchmarkNow-16                   528529          2261 ns/op
BenchmarkNowUnixNano-16           532738          2254 ns/op
BenchmarkNowUnixMilli-16          531685          2253 ns/op
BenchmarkNowUnixMicro-16          522962          2286 ns/op
PASS
ok      time    7.798s

Both computers run a similar OS: ArchLinux, kernel updated today.

ianlancetaylor commented 1 year ago

There must be some relevant difference between them.

Try building a trivial Go program that calls time.Now and run it on both systems under strace -f. See if there are any differences.

bradfitz commented 1 year ago

@pierrre, are you in a VM? What type?

pierrre commented 1 year ago

Hello, I will try to run strace, but I don't have access to the other computer today, so I will come back later.

@bradfitz no I'm not using a VM Both computers are normal laptops with ArchLinux installed, same kernel version. The "slow" one is a Lenovo Thinkpad (T14S Gen 1, less than 1 year old). The "fast" one is a Dell XPS 15 (few years old).

Yes my production environment is running in a VM, but I didn't talk about it in this issue.

pierrre commented 1 year ago

FYI on my "slow" computer, strace shows

[pid 45410] clock_gettime(CLOCK_MONOTONIC, {tv_sec=12677, tv_nsec=359642702}) = 0
[pid 45410] clock_gettime(CLOCK_REALTIME, {tv_sec=1673525811, tv_nsec=135457247}) = 0

repeated thousands of times with almost no interruption.

I will try on my "fast" computer later (I don't have access to it right now)

ianlancetaylor commented 1 year ago

That looks like the "slow" computer is somehow missing the VDSO that contains a non-syscall implementation of clock_gettime. I'm not sure offhand what could cause that. The VDSO is normally supplied by the kernel. See man vdso for background.

prattmic commented 1 year ago

You could try searching dmesg for messages containing "VDSO" or "TSC" (or "time", if those yield nothing). My guess would be that the kernel is deciding that the TSC is not available/reliable, thus forcing the VDSO to fall back to system calls.

Given that you see the system calls with strace, that means that VDSO is indeed falling back to system calls (or, less likely, the VDSO doesn't exist at all, but I don't think modern Linux ever omits it entirely). This is the source of the slowness (it is not a Go issue), the remaining question is why it is falling back.

prattmic commented 1 year ago

You can look at:

/sys/devices/system/clocksource/clocksource0/available_clocksource /sys/devices/system/clocksource/clocksource0/current_clocksource

To see the available and current clock source. The clock source needs to be tsc for the VDSO to work without fallback (or pvclock/hvclock, but those are only relevant to VM guests).

prattmic commented 1 year ago

It could be related to one of https://bugzilla.kernel.org/show_bug.cgi?id=202525 or https://bugzilla.kernel.org/show_bug.cgi?id=216146 (both of which should result in dmesg warnings).

pierrre commented 1 year ago

Here is the result of strace on my "fast" computer:

[pid  5814] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=5812, si_uid=1000} ---
[pid  5813] <... tgkill resumed>)       = 0
[pid  5814] rt_sigreturn({mask=[]} <unfinished ...>
[pid  5813] nanosleep({tv_sec=0, tv_nsec=10000000},  <unfinished ...>
[pid  5814] <... rt_sigreturn resumed>) = 387849827
[pid  5813] <... nanosleep resumed>NULL) = 0
[pid  5813] epoll_pwait(3, [], 128, 0, NULL, 0) = 0
[pid  5813] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid  5813] epoll_pwait(3, [], 128, 0, NULL, 0) = 0
[pid  5813] getpid()                    = 5812
[pid  5813] tgkill(5812, 5814, SIGURG <unfinished ...>

repeated thousands of times

pierrre commented 1 year ago

I asked several coworkers at my company to run the same benchmark on various computer models:

5 of them have an Intel CPU, and the benchmark result is "fast": 30ns

1 has exactly the same model/CPU as I have: ThinkPad 14S Gen 1, AMD Ryzen 7 PRO 4750U His Linux distribution is Fedora 37

$ uname -a
Linux fedora 6.0.18-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 7 17:10:00 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

And he can reproduce the "slow" benchmark result: ~2000ns !

$ go version && go test time -test.run=xxx -test.bench=BenchmarkNow
go version go1.19.2 linux/amd64
goos: linux
goarch: amd64
pkg: time
cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics
BenchmarkNow-16                   540297          2172 ns/op
BenchmarkNowUnixNano-16           540292          2220 ns/op
BenchmarkNowUnixMilli-16          547329          2206 ns/op
BenchmarkNowUnixMicro-16          557311          2191 ns/op
PASS
ok      time    8.648s
pierrre commented 1 year ago

About the clock source:

On my "slow" computer:

➜  ~ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
hpet acpi_pm 
➜  ~ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet

On my "fast" computer:

➜  ~ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc acpi_pm 
➜  ~ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

As far as I know, I didn't configure anything related to the clock source when I installed these 2 computers

pierrre commented 1 year ago

About dmesg:

On my "slow" computer:

➜  ~ sudo dmesg | grep -iE "(vdso|tsc|clocksource)"
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 1696.689 MHz processor
[    0.036105] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns
[    0.096170] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
[    0.112863] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1874efa0393, max_idle_ns: 440795291471 ns
[    0.232866] TSC synchronization [CPU#0 -> CPU#1]:
[    0.232866] Measured 3406960412 cycles TSC warp between CPUs, turning off TSC clock.
[    0.232866] tsc: Marking TSC unstable due to check_tsc_sync_source failed
[    0.240266] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns
[    0.302925] clocksource: Switched to clocksource hpet
[    0.309641] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[   17.853505] SVM: TSC scaling supported

On my "fast" computer:

➜  ~ sudo dmesg | grep -iE "(vdso|tsc|clocksource)"     
[    0.000000] tsc: Detected 2200.000 MHz processor
[    0.000000] tsc: Detected 2199.996 MHz TSC
[    0.041814] TSC deadline timer available
[    0.041877] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns
[    0.123401] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1fb62f12e8c, max_idle_ns: 440795238402 ns
[    0.151562] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns
[    0.531587] clocksource: Switched to clocksource tsc-early
[    0.559199] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    1.587084] tsc: Refined TSC clocksource calibration: 2207.998 MHz
[    1.587089] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1fd3b6211b4, max_idle_ns: 440795230313 ns
[    1.587143] clocksource: Switched to clocksource tsc

I don't really know how to interpret this

pierrre commented 1 year ago

This is the source of the slowness (it is not a Go issue), the remaining question is why it is falling back.

OK I understand, feel free to close this issue if there is nothing we can do on Go side. Thank you for your help :+1:

prattmic commented 1 year ago
[    0.232866] Measured 3406960412 cycles TSC warp between CPUs, turning off TSC clock.
[    0.232866] tsc: Marking TSC unstable due to check_tsc_sync_source failed

This is the immediate problem. The kernel has decided that the TSC is not reliable and disabled its use.

There is nothing we can do in Go. I suggest reaching out on one of the kernel bugs. https://bugzilla.kernel.org/show_bug.cgi?id=202525 has the same kernel warnings, though a slightly different CPU model.