Closed tv42 closed 11 months ago
Oh this might be relevant, too:
Linux brute 4.2.0-38-generic #45-Ubuntu SMP Wed Jun 8 21:21:49 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ go test -v -run=TestTraceFutileWakeup runtime/trace
=== RUN TestTraceFutileWakeup
--- SKIP: TestTraceFutileWakeup (0.03s)
trace_test.go:67: skipping trace: time stamps out of order
PASS
ok runtime/trace 0.034s
Is this running in a VM or directly on physical hardware? Looks like a laptop?
brute
in the above is a desktop machine, bare metal.
/cc @dvyukov @hyangah -- this sounds like a dup of something I saw once before. Ring any bells?
This looks like a buggy processor/OS. @tv42 are you sure that C programs get reasonable data out of RDTSC on this machine? We probably need a litmus test C program checked in somewhere. But we don't have it as far as I know.
I'm not sure why you'd call an RDTSC varying between cores/chips "buggy". I mean, this particular BIOS may even fail to synchronize TSC between the cores at boot, and it may be buggy, but the assumption of unified TSC across cores is not safe.
Invariant TSC was a feature added to the Nehalem generation Intel chips, and even then you have no guarantees that the computer doesn't actually have multiple cpus (not just multiple cores on one cpu).
You just simply can't rely on a singular RDTSC source across all possible cores, without further information that that assumption is safe. Outside of a kernel, that sounds like risky business.
http://lxr.linux.no/linux+v2.6.38/arch/x86/kernel/tsc.c#L859
For what it's worth, Linux properly diagnoses TSC as unreliable on this host, and uses hpet instead.
$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
Decent summary of TSC issues (talks about hardware, also outside of virtualization): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/virtual/kvm/timekeeping.txt?h=v4.12-rc5
@tv42 do you have any proposition as to how we can fix this?
I'm hitting this "time stamps out of order" issue as well, using Gentoo's go version go1.8.3 linux/amd64
. Bare metal, AMD processor. Anything I can do to aid in debugging?
go test -v runtime/trace
=== RUN TestTraceSymbolize
--- SKIP: TestTraceSymbolize (0.21s)
trace_test.go:75: skipping trace: time stamps out of order
=== RUN TestTraceStartStop
--- PASS: TestTraceStartStop (0.10s)
=== RUN TestTraceDoubleStart
--- PASS: TestTraceDoubleStart (0.00s)
=== RUN TestTrace
--- PASS: TestTrace (0.00s)
=== RUN TestTraceStress
--- SKIP: TestTraceStress (0.60s)
trace_test.go:75: skipping trace: time stamps out of order
=== RUN TestTraceStressStartStop
--- SKIP: TestTraceStressStartStop (0.13s)
trace_test.go:75: skipping trace: time stamps out of order
=== RUN TestTraceFutileWakeup
--- SKIP: TestTraceFutileWakeup (0.22s)
trace_test.go:75: skipping trace: time stamps out of order
PASS
ok runtime/trace 1.280s
go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/mark/go"
GORACE=""
GOROOT="/usr/lib64/go"
GOTOOLDIR="/usr/lib64/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="x86_64-pc-linux-gnu-gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build020701565=/tmp/go-build -gno-record-gcc-switches"
CXX="x86_64-pc-linux-gnu-g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
uname -a
Linux mark-gentoo 4.9.6-gentoo-r1 #2 SMP Thu Sep 14 16:30:26 EDT 2017 x86_64 AMD FX(tm)-8320 Eight-Core Processor AuthenticAMD GNU/Linux
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 1
Vendor ID: AuthenticAMD
CPU family: 21
Model: 2
Model name: AMD FX(tm)-8320 Eight-Core Processor
Stepping: 0
CPU MHz: 1700.000
CPU max MHz: 3500.0000
CPU min MHz: 1400.0000
BogoMIPS: 7000.28
Virtualization: AMD-V
L1d cache: 16K
L1i cache: 64K
L2 cache: 2048K
L3 cache: 8192K
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
@mpictor out of curiosity: does the problem go away if you change the clock source to tsc
?
@kostix it does.
echo tsc | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource
go test -v -run=TestTraceFutileWakeup runtime/trace
=== RUN TestTraceFutileWakeup
--- PASS: TestTraceFutileWakeup (0.07s)
PASS
ok runtime/trace 0.074s
That said, as tv42 notes, assumptions about the TSC break on multi-socket systems.
While this worked once, I can't repeat it - writing tsc
to current_clocksource
now has no effect and the following is written to dmesg
:
clocksource: Override clocksource tsc is unstable and not HRT compatible - cannot switch while in HRT/NOHZ mode
@dvyukov I don't know enough about pprof to really have much useful to say.
I wonder how big the overhead of clock_gettime(CLOCK_MONOTONIC, ...)
is compared to TSC. That's still only an vDSO call.
Is it possible to collect (core, tsc) pairs? Times between cores would be essentially incomparable, but for short runs, you could assume that the offsets of the different TSCs don't drift much.
For longer runs, perhaps there would be a mechanism to regularly emit (core, tsc, monotonic_clock) sync point, that can be used to correct for drift. And thread start/exit should probably emit that too. That should amortize the (assumed) slower clock_gettime(CLOCK_MONOTONIC, ...)
calls.
Of course, that leaves the analysis side stuck with a more difficult problem. But hey, it's no different from trying to analyze similar traces from a distributed system (because it is one).
I wonder how big the overhead of clock_gettime(CLOCK_MONOTONIC, ...) is compared to TSC.
Does it provide enough precision?
Is it possible to collect (core, tsc) pairs?
Yes.
Does it provide enough precision?
clock_getres
on this system claims 1ns, but I understand different HPET hardware implementations can have different resolution, all the way down to 10Hz.
It is a lot slower to take a measurement, even with the vDSO optimized not-real-syscalls. On this system (brute
from earlier), runtime.cputick
takes about 10ns, sequential clock_gettime(CLOCK_MONOTONIC, ...)
calls from C report times 629ns apart, and runtime.nanotime
call takes 632ns. cat /sys/devices/system/clocksource/clocksource0/current_clocksource
gives hpet
.
For comparison, a laptop with i5-5300U and clocksource TSC says runtime.cputick
takes 13.5ns, runtime.nanotime
takes 20.7ns, sequential clock_gettime(CLOCK_MONOTONIC, ...)
calls from C report times 50-120ns apart, and unix.ClockGettime(unix.CLOCK_MONOTONIC, ...)
takes 156ns.
I just realized CLOCK_MONOTONIC is still subject to NTP time warping, I'll repeat my experiments with CLOCK_MONOTONIC_RAW.
CLOCK_MONOTONIC
and CLOCK_MONOTONIC_RAW
timings seem to match each other; I assume that means the NTP drift adjustement happens amortized in the logic that writes to the vDSO data page, not as part of clock_gettime
.
Change https://golang.org/cl/97757 mentions this issue: cmd/trace: skip tests if parsing fails with timestamp error
Change https://golang.org/cl/105821 mentions this issue: cmd/trace: skip GoroutineInSyscall in case time order err
hyangah
says, in #24663:
we can continue the discussion on the timestamp out of order error in #16755
and
BTW, I fixed up the timestamps of the problematic record
@hyangah how did you fix the timestamps? I looked into that but didn't have success.
I'm really disappointed to see that the milestone is now Unplanned
- this bug makes trace unusable for me :(
I'm experiencing this issue as well. I'm trying to retrieve a trace file from a process on one of our servers and open it to find bottlenecks and I'm getting the out of order error that is consistent with this thread. This prevents me from using any of the go tooling for profiling.
It would be nice to know how many events are out of order and have the option to ignore them. If only a few events are out of order I would prefer to still be able to work on my trace but with a warning so that I know to be skeptical about what I see or look for anomalies. Right now I can't use any tooling and there may be valuable information for me to look at.
Has anyone found a solution for this?
Our environment is bare metal not virtualized. Nothing exotic in here I don't think.
go version
go version go1.12 linux/amd64
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Model name: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
Stepping: 7
CPU MHz: 3429.687
CPU max MHz: 3800.0000
CPU min MHz: 1200.0000
BogoMIPS: 6399.94
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 12288K
NUMA node0 CPU(s): 0-11
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp kaiser tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts flush_l1d
Interesting thing I noticed is the system is really overloaded. It's a 12 core box with a load of 63 and dstat
shows that the CPU cores are struggling on WAIT. This does not explain to me why the trace events are experiencing backwards time events though?
dstat --all
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
1 1 58 40 0 0|9909k 4119k| 0 0 | 0 0 |4540 11k
2 1 1 96 0 0| 0 1900k| 403k 314k| 0 0 | 12k 27k
3 1 0 96 0 0| 0 2160k| 433k 319k| 0 0 | 12k 27k
2 0 1 97 0 0| 0 1820k| 485k 356k| 0 0 | 12k 27k
2 1 2 95 0 0| 0 1749k| 362k 289k| 0 0 | 10k 24k
Also experiencing this issue.
go version go1.17.1 linux/amd64
/proc/cpuinfo
contents:
go env
GO111MODULE=""
GOARCH="amd64"
GOBIN="/home/jorge/go/bin"
GOCACHE="/home/jorge/.cache/go-build"
GOENV="/home/jorge/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/jorge/go/bin/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/jorge/go/bin"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.1"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1609529823=/tmp/go-build -gno-record-gcc-switches"
the same code works fine on another machine (a macbook)
same issue
go version
go version go1.18.3 linux/amd64
uname -a
Linux steve-Lenovo-G50-75m 5.15.0-39-generic #42-Ubuntu SMP Thu Jun 9 23:42:32 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 21
model : 48
model name : AMD A10-7300 Radeon R6, 10 Compute Cores 4C+6G
stepping : 1
microcode : 0x6003106
cpu MHz : 1321.115
cache size : 2048 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 16
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate ssbd vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov
bugs : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips : 3792.92
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]
processor : 1
vendor_id : AuthenticAMD
cpu family : 21
model : 48
model name : AMD A10-7300 Radeon R6, 10 Compute Cores 4C+6G
stepping : 1
microcode : 0x6003106
cpu MHz : 1322.469
cache size : 2048 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
apicid : 17
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate ssbd vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov
bugs : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips : 3792.92
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]
processor : 2
vendor_id : AuthenticAMD
cpu family : 21
model : 48
model name : AMD A10-7300 Radeon R6, 10 Compute Cores 4C+6G
stepping : 1
microcode : 0x6003106
cpu MHz : 1321.171
cache size : 2048 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 2
apicid : 18
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate ssbd vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov
bugs : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips : 3792.92
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]
processor : 3
vendor_id : AuthenticAMD
cpu family : 21
model : 48
model name : AMD A10-7300 Radeon R6, 10 Compute Cores 4C+6G
stepping : 1
microcode : 0x6003106
cpu MHz : 1097.798
cache size : 2048 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 2
apicid : 19
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate ssbd vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov
bugs : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips : 3792.92
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]
trace file
trace.out.zip
Hi. I am also experiencing this.
What version of Go are you using (go version)? go version go1.19.4 darwin/amd64
Does this issue reproduce with the latest release? YES
What operating system and processor architecture are you using (go env)?
Apple M1 Max
``` GO111MODULE="on" GOARCH="amd64" GOBIN="" GOCACHE="/Users/adeshina/Library/Caches/go-build" GOENV="/Users/adeshina/Library/Application Support/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/adeshina/go/pkg/mod" GONOPROXY="github.com/gettreasure-mother/*,github.com/linuxfoundation-it/*,github.com/LF-Engineering" GONOSUMDB="github.com/gettreasure-mother/*,github.com/linuxfoundation-it/*,github.com/LF-Engineering" GOOS="darwin" GOPATH="/Users/adeshina/go" GOPRIVATE="github.com/acme-corporation/internal-rpc-client" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/local/opt/go/libexec" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/opt/go/libexec/pkg/tool/darwin_amd64" GOVCS="" GOVERSION="go1.19.4" GCCGO="gccgo" GOAMD64="v1" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="/Users/adeshina/Workspace/go-apps/go-tracing/go.mod" GOWORK="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/4l/0kgwdc8d0cl8ss8x6__q0gp00000gn/T/go-build3344081618=/tmp/go-build -gno-record-gcc-switches -fno-common" ```
What did you expect to see? I expected the trace to be read and the browser to present itself.
What did you see instead?
2022/12/19 15:15:56 Parsing trace...
failed to parse trace: time stamps out of order
The same problem, go1.19.5
on Apple M1 Pro
.
GO111MODULE="auto" GOARCH="amd64" GOBIN="" GOCACHE="/Users/zhangjie/Library/Caches/go-build" GOENV="/Users/zhangjie/Library/Application Support/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/zhangjie/go/pkg/mod" GONOPROXY="git.woa.com/*" GONOSUMDB="git.woa.com/*" GOOS="darwin" GOPATH="/Users/zhangjie/go" GOPRIVATE="git.woa.com/*" GOPROXY="https://goproxy.woa.com,direct" GOROOT="/usr/local/go" GOSUMDB="off" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64" GOVCS="" GOVERSION="go1.19.5" GCCGO="gccgo" GOAMD64="v1" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="/Users/zhangjie/Github/codemaster/go.mod" GOWORK="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/4m/2_pn1sln1fzcg2zg_4mh95z80000gn/T/go-build2690046038=/tmp/go-build -gno-record-gcc-switches -fno-common"
Same issue in go version go1.20.2 darwin/arm64
2023/06/30 19:26:31 Parsing trace... failed to parse trace: time stamps out of order
Any workaround or fix ?
Uh that is interesting. On Intel cores, this mostly triggered with older generations (before Nehalem, mostly 10+ year old machines by now). I do not know enough about AMD, but I guess the same was true there?
However, this triggering on Apple's M1 Pro and M1 Max means new hardware also suffers from this. (Or has symptoms that get confused with this?)
With the new tracer this error can no longer happen -- timestamps are not used for ordering events anymore. (See #60773.)
The RDTSCP event ordering logic talked about in #10512 and #15102 (tricky palindrome bug numbers) doesn't seem to be working, at all.
go version
)?go version go1.7 linux/amd64
go env
)?/proc/cpuinfo
contents:Imported
net/http/pprof
, put load on the application (webserver +wrk
), fetchedhttp://localhost:9999/debug/pprof/trace?seconds=1
, rango tool trace trace.out
Web browser opening to the trace viewer.
If I constrain the app to a single core with
taskset
, the trace viewer works.