Closed pierrre closed 1 year ago
For what it's worth we already have a benchmark for time.Now
in the time package. it's called BenchmarkNow
.
When I run go test time -test.run=xxx -test.bench=BenchmarkNow
on my amd64-linux laptop I get
BenchmarkNow-16 44539380 25.24 ns/op
You got similar results on your "other computer."
So tell us more about the computer on which time.Now
is slow.
Here is the result:
➜ ~ go test time -test.run=xxx -test.bench=BenchmarkNow
goos: linux
goarch: amd64
pkg: time
cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics
BenchmarkNow-16 528529 2261 ns/op
BenchmarkNowUnixNano-16 532738 2254 ns/op
BenchmarkNowUnixMilli-16 531685 2253 ns/op
BenchmarkNowUnixMicro-16 522962 2286 ns/op
PASS
ok time 7.798s
Both computers run a similar OS: ArchLinux, kernel updated today.
There must be some relevant difference between them.
Try building a trivial Go program that calls time.Now
and run it on both systems under strace -f
. See if there are any differences.
@pierrre, are you in a VM? What type?
Hello, I will try to run strace, but I don't have access to the other computer today, so I will come back later.
@bradfitz no I'm not using a VM Both computers are normal laptops with ArchLinux installed, same kernel version. The "slow" one is a Lenovo Thinkpad (T14S Gen 1, less than 1 year old). The "fast" one is a Dell XPS 15 (few years old).
Yes my production environment is running in a VM, but I didn't talk about it in this issue.
FYI on my "slow" computer, strace shows
[pid 45410] clock_gettime(CLOCK_MONOTONIC, {tv_sec=12677, tv_nsec=359642702}) = 0
[pid 45410] clock_gettime(CLOCK_REALTIME, {tv_sec=1673525811, tv_nsec=135457247}) = 0
repeated thousands of times with almost no interruption.
I will try on my "fast" computer later (I don't have access to it right now)
That looks like the "slow" computer is somehow missing the VDSO that contains a non-syscall implementation of clock_gettime
. I'm not sure offhand what could cause that. The VDSO is normally supplied by the kernel. See man vdso
for background.
You could try searching dmesg
for messages containing "VDSO" or "TSC" (or "time", if those yield nothing). My guess would be that the kernel is deciding that the TSC is not available/reliable, thus forcing the VDSO to fall back to system calls.
Given that you see the system calls with strace, that means that VDSO is indeed falling back to system calls (or, less likely, the VDSO doesn't exist at all, but I don't think modern Linux ever omits it entirely). This is the source of the slowness (it is not a Go issue), the remaining question is why it is falling back.
You can look at:
/sys/devices/system/clocksource/clocksource0/available_clocksource
/sys/devices/system/clocksource/clocksource0/current_clocksource
To see the available and current clock source. The clock source needs to be tsc
for the VDSO to work without fallback (or pvclock
/hvclock
, but those are only relevant to VM guests).
It could be related to one of https://bugzilla.kernel.org/show_bug.cgi?id=202525 or https://bugzilla.kernel.org/show_bug.cgi?id=216146 (both of which should result in dmesg warnings).
Here is the result of strace on my "fast" computer:
[pid 5814] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=5812, si_uid=1000} ---
[pid 5813] <... tgkill resumed>) = 0
[pid 5814] rt_sigreturn({mask=[]} <unfinished ...>
[pid 5813] nanosleep({tv_sec=0, tv_nsec=10000000}, <unfinished ...>
[pid 5814] <... rt_sigreturn resumed>) = 387849827
[pid 5813] <... nanosleep resumed>NULL) = 0
[pid 5813] epoll_pwait(3, [], 128, 0, NULL, 0) = 0
[pid 5813] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 5813] epoll_pwait(3, [], 128, 0, NULL, 0) = 0
[pid 5813] getpid() = 5812
[pid 5813] tgkill(5812, 5814, SIGURG <unfinished ...>
repeated thousands of times
I asked several coworkers at my company to run the same benchmark on various computer models:
5 of them have an Intel CPU, and the benchmark result is "fast": 30ns
1 has exactly the same model/CPU as I have: ThinkPad 14S Gen 1, AMD Ryzen 7 PRO 4750U His Linux distribution is Fedora 37
$ uname -a
Linux fedora 6.0.18-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 7 17:10:00 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
And he can reproduce the "slow" benchmark result: ~2000ns !
$ go version && go test time -test.run=xxx -test.bench=BenchmarkNow
go version go1.19.2 linux/amd64
goos: linux
goarch: amd64
pkg: time
cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics
BenchmarkNow-16 540297 2172 ns/op
BenchmarkNowUnixNano-16 540292 2220 ns/op
BenchmarkNowUnixMilli-16 547329 2206 ns/op
BenchmarkNowUnixMicro-16 557311 2191 ns/op
PASS
ok time 8.648s
About the clock source:
On my "slow" computer:
➜ ~ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
hpet acpi_pm
➜ ~ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
On my "fast" computer:
➜ ~ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc acpi_pm
➜ ~ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
As far as I know, I didn't configure anything related to the clock source when I installed these 2 computers
About dmesg:
On my "slow" computer:
➜ ~ sudo dmesg | grep -iE "(vdso|tsc|clocksource)"
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000000] tsc: Detected 1696.689 MHz processor
[ 0.036105] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns
[ 0.096170] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
[ 0.112863] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1874efa0393, max_idle_ns: 440795291471 ns
[ 0.232866] TSC synchronization [CPU#0 -> CPU#1]:
[ 0.232866] Measured 3406960412 cycles TSC warp between CPUs, turning off TSC clock.
[ 0.232866] tsc: Marking TSC unstable due to check_tsc_sync_source failed
[ 0.240266] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns
[ 0.302925] clocksource: Switched to clocksource hpet
[ 0.309641] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[ 17.853505] SVM: TSC scaling supported
On my "fast" computer:
➜ ~ sudo dmesg | grep -iE "(vdso|tsc|clocksource)"
[ 0.000000] tsc: Detected 2200.000 MHz processor
[ 0.000000] tsc: Detected 2199.996 MHz TSC
[ 0.041814] TSC deadline timer available
[ 0.041877] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns
[ 0.123401] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1fb62f12e8c, max_idle_ns: 440795238402 ns
[ 0.151562] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns
[ 0.531587] clocksource: Switched to clocksource tsc-early
[ 0.559199] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[ 1.587084] tsc: Refined TSC clocksource calibration: 2207.998 MHz
[ 1.587089] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1fd3b6211b4, max_idle_ns: 440795230313 ns
[ 1.587143] clocksource: Switched to clocksource tsc
I don't really know how to interpret this
This is the source of the slowness (it is not a Go issue), the remaining question is why it is falling back.
OK I understand, feel free to close this issue if there is nothing we can do on Go side. Thank you for your help :+1:
[ 0.232866] Measured 3406960412 cycles TSC warp between CPUs, turning off TSC clock.
[ 0.232866] tsc: Marking TSC unstable due to check_tsc_sync_source failed
This is the immediate problem. The kernel has decided that the TSC is not reliable and disabled its use.
There is nothing we can do in Go. I suggest reaching out on one of the kernel bugs. https://bugzilla.kernel.org/show_bug.cgi?id=202525 has the same kernel warnings, though a slightly different CPU model.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Disclaimer: yes I know, I'm trying to do something "silly" (benchmarking
time.Now()
). But it has a real impact in production for me.Context: I'm trying to measure the execution time of a function that is "fast" (below 1 microsecond).
I ran a benchmark on my function (with
testing.B
), so I know it's fast (below 1 microsecond). But I want to measure its execution time while it's running in production (and report it in my metrics).So I did something simple:
Then I noticed that the measured duration was much higher than the time measured in my benchmark (around 5 microseconds, vs 1 microsecond in my benchmark).
So I did a benchmark that calls
time.Now()
:(Yes I closed all other running apps)
I'm very surprised, because this time (2249ns) is very high. If I remember correctly, on my other computer it was 2 orders of magnitude lower (30ns)
I did a profiling of my benchmark. All the CPU time is spent in
time.now()
(yes, lower casenow
)My question: how can I measure the execution time of a function, if calling
time.Now()
is slower than calling my function ?Feel free to close this issue if you think it's not a problem.
What did you expect to see?
Calling
time.Now()
should take approximately 30ns.This is the benchmark result on my other computer:
What did you see instead?
Calling
time.Now()
takes 2249ns, which is slower than the execution time of the function I'm trying to measure.