Open p-zak opened 1 year ago
cc @golang/runtime
Thanks for the report. Since there is a lot of information, here is my paraphrased understanding. Please let me know if I misunderstood anything.
jitter
is a C program that looks for jitter in execution time of a tight loop (due to OS interference, e.g.).jitter
program is running on dedicated cores (no other threads allowed).jitter
program measures elevated jitter, including additional CPU interrupts, compared to an otherwise idle system and compared to a Python program running on non-dedicated cores.@prattmic Thanks for quick response!
jitter
is a C program that looks for jitter in execution time of a tight loop (due to OS interference, e.g.).
Yes! Details from https://github.com/FDio/archived-pma_tools/blob/master/jitter/jitter.c#L21:
* The program does the following:
* - Calls a dummy function which consists of a small piece of code block being executed in a loop. The loop count is user defined (default 80,000).
* - Measures the Execution time for this function in Core cycles
* - Compares the Execution time with Instantaneous Minimum and Maximum execution time values tracked during the display update interval. These values are adjusted based on the comparison.
* - Compares the execution time with Absolute Minimum and maximum execution values since the program started. These values are adjusted based on the comparison.
* - Repeats above operation.
- The
jitter
program is running on dedicated cores (no other threads allowed).
Yes, and isolated.
- A Go program is running on other, non-dedicated cores.
Exactly.
- The
jitter
program measures elevated jitter, including additional CPU interrupts, compared to an otherwise idle system and compared to a Python program running on non-dedicated cores.
That's right. Python scenario is for reference. During tests with Go app, Python app was disabled of course.
This sounds like a limitation in the CPU isolation functionality in the Linux kernel, not a Go issue.
If the Go program is isolated away from a CPU, then its behaviors should presumably not affect that CPU, but it seems like it is. With regards to the interrupts, some possibilities I could imagine:
This sounds like a limitation in the CPU isolation functionality in the Linux kernel, not a Go issue.
If the Go program is isolated away from a CPU, then its behaviors should presumably not affect that CPU, but it seems like it is.
@prattmic Generally, I agree. However, please look closely at Python scenario :)
As you can see in run_jitter.sh
I run tool using: taskset -c 21 ./jitter -c 20 -f -i 1450 -l 15500
. It means that initially, thread affinity is set to (isolated) CPU21, but -c 20
changes it to (isolated) CPU20 inside jitter
tool (I can run it using taskset -c 20 ./jitter -c 20
and results will be the same for every scenario). Main point is that jitter
uses only isolated cores.
Both Python
and Go
app are run by: taskset -a -c 8
. It means that all threads for a given command should run on CPU8 (non-isolated).
jitter
runs for about 335s
and I collect interrupts from /proc/interrupts
just before jitter
starts and just after jitter
ends for isolated CPU20-23 and non-isolated CPU24 (for reference). Only these kind of interrupts appear for these cores for all scenarios: LOC, IWI, RES, CAL, TLB (I believe that last one is done by jitter
when it ends). No other types for these cores.
So here is the difference in these interrupts for "BASELINE" (only jitter
running):
Here is for jitter
with Python
app:
And here is for jitter
with Go
app:
We can see that Python
app doesn't influence isolated cores at all, while Go
equivalent unfortunately does.
As a Go engineer, I've been trying to find out how to proceed with writing applications in Go
but have impact for isolated cores as in Python
;)
What and where should be tuned/configured/changed/fixed to achieve that?
I acknowledge that there is a difference between your Python and Go cases, but the Python case is also simpler. e.g., with CPython 3.11 on my machine, that program is single-threaded. On the other hand, Go programs are always multi-threaded, even with only one goroutine. So there is more interesting surface in the Go case.
Given the connection with isolated CPUs (a kernel feature) and interrupts (which Go has no direct control over), I think this issue would be better served on a Linux kernel mailing list / issue tracker, where there are more experts in these areas. If there are concrete things we are doing wrong that makes us a bad neighbor, then we can consider whether they should change.
+1 to what Michael said, and to add on, here are a few things you can try to attempt to isolate the issue:
GODEBUG=asyncpreemptoff=1
.GOGC=off
(given that you're only printing "hello world" every second, I doubt you'll actually run out of memory in any reasonable amount of time... however I also doubt you're triggering the GC at all).However, given that you're seeing jitter on the order of 10 seconds, I don't think there's any obvious thing (to us, anyway) the Go runtime is doing that would cause the jitter. Honestly, I don't really expect any of the 3 things above to have an effect, but it's something to rule out.
Beyond that, I'm not sure we can provide more help here at this time.
@mknyszek
- Buffer the print to stdout (os.Stdout and os.Stderr are not buffered by default, IIRC they are with Python.)
Even without printing anything (just for
with 1s
sleep I have the same results).
- Run the Go program with
GOGC=off
(given that you're only printing "hello world" every second, I doubt you'll actually run out of memory in any reasonable amount of time... however I also doubt you're triggering the GC at all).
Forgot to mention that I've tried that already. Also tried GOMAXPROCS=1
. No success.
- Run the Go program with
GODEBUG=asyncpreemptoff=1
.
This looked promising, unfortunately didn't change anything :/
Is the Python program even a good comparison regarding a multithreaded program given the GIL?
Is the Python program even a good comparison regarding a multithreaded program given the GIL?
@andig From low level perspective (like GIL) such comparison may not be fair, I agree.
However, it makes sense for someone who want to choose language for application basing on how bad/good neighbor it will be in the OS.
I did two more tests...
hello.rs
- simple RUST app which prints "Hello world" every 1 seconduse std::{thread, time::Duration}; fn main() { loop { println!("Hello world"); thread::sleep(Duration::from_secs(1)); } }
run_rust_hello.sh
- script to run RUST app on particular (non-isolated) core#!/bin/bash taskset -a -c 8 ./hello
$ rustc --version
rustc 1.70.0 (90c541806 2023-05-31)
In first console RUST app was built: rustc hello.rs
and started: ./run_rust_hello.sh
, in second console ./run_jitter.sh
was run.
Results:
Comment: 1 significant jitter (62154ns), the remaining jitters did not exceed 106ns during 335 seconds. Following interruptions were made during benchmark:
hello.cpp
- simple C++ app which prints "Hello world" every 1 seconduse std::{thread, time::Duration}; fn main() { loop { println!("Hello world"); thread::sleep(Duration::from_secs(1)); } }
run_cpp_hello.sh
- script to run C++ app on particular (non-isolated) core#!/bin/bash taskset -a -c 8 ./hello
$ g++ --version
g++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
In first console C++ app was built: g++ hello.cpp -o hello
and started: ./run_cpp_hello.sh
, in second console ./run_jitter.sh
was run.
Results:
Comment: 1 significant jitter (70972ns), the remaining jitters did not exceed 59ns during 335 seconds. Following interruptions were made during benchmark:
@mknyszek @andig @prattmic
Originally, I used go1.20.5 linux/amd64
to build simple Golang app for this issue.
Later, I also built it using:
go1.20.6 linux/amd64
go1.20.7 linux/amd64
go1.20.8 linux/amd64
go1.21.0 linux/amd64
(with go 1.21
in go.mod
)and results were the same.
But look at results I've got when I built this app using go1.21.1 linux/amd64
(with go 1.21
in go.mod
):
Comment: 0 significant jitters (the biggest jitter did not exceed 50 during 335 seconds. Following interruptions were made during benchmark:
go1.21.1 linux/amd64
) - and I didn't observe any jitters or interrupts on isolated CPU20 during benchmark. There were a lot of jitters and interrupts when these apps were built with previous versions of go.It would be great to find which commit from these: https://github.com/golang/go/issues?q=milestone%3AGo1.21.1+label%3ACherryPickApproved has actually fixed this problem and root-cause why. Can you help me with that?
IIRC, nothing in 1.21.1 intentionally affected scheduling. My best guess would be https://github.com/golang/go/issues/62329. To be sure, you should bisect on branch release-branch.go1.21
(the individual releases are tagged).
@prattmic Right, https://github.com/golang/go/issues/62329 (runtime: MADV_HUGEPAGE causes stalls when allocating memory
) was also my best guess :)
These are commit between tag: go1.21.0
and tag: go1.21.1
for release-branch.go1.21
branch:
2c1e5b05fe (tag: go1.21.1) [release-branch.go1.21] go1.21.1
bbd043ff0d [release-branch.go1.21] html/template: properly handle special tags within the script context
b0e1d3ea26 [release-branch.go1.21] html/template: support HTML-like comments in script contexts
d25a935574 [release-branch.go1.21] cmd/go: reject toolchain directives containing path separators
e3ba569c78 [release-branch.go1.21] net/http: revert "support streaming POST content in wasm"
8dc6ad1c61 [release-branch.go1.21] runtime: restore caller's frame pointer when recovering from panic
06df3292a8 [release-branch.go1.21] runtime: avoid MADV_HUGEPAGE for heap memory
b120517ffd [release-branch.go1.21] go/types, types2: remove order dependency in inference involving channels
0a9582163c [release-branch.go1.21] cmd/go: retry ETXTBSY errors when running test binaries
91a4e74b98 [release-branch.go1.21] crypto/tls: QUIC: fix panics when processing post-handshake messages
6385a6fb18 [release-branch.go1.21] cmd/go: find GOROOT using os.Executable when installed to GOROOT/bin/GOOS_GOARCH
2d07bb86f0 [release-branch.go1.21] encoding/xml: overriding by empty namespace when no new name declaration
745b81b6e6 [release-branch.go1.21] encoding/gob: prevent panic from index out of range in Decoder.typeString
13339c75b8 [release-branch.go1.21] runtime: fix maps.Clone bug when cloning a map mid-grow
2977709875 [release-branch.go1.21] context: fix synchronization in ExampleAfterFunc_cond
2d4746f37b [release-branch.go1.21] go/types, types2: disable interface inference for versions before Go 1.21
2b8026f025 [release-branch.go1.21] cmd/compile: in expandCalls, move all arg marshalling into call block
7c97cc7d97 [release-branch.go1.21] Revert "os: use handle based APIs to read directories on windows"
cb6ea94996 [release-branch.go1.21] Revert "cmd/compile: omit redundant sign/unsign extension on arm64"
45b98bfb79 [release-branch.go1.21] path/filepath: don't drop .. elements when cleaning invalid Windows paths
bac083a584 [release-branch.go1.21] cmd/link: don't mangle string symbol names
70aa116c4a [release-branch.go1.21] runtime/internal/wasitest: skip racy TCP echo test
31c5a236bc [release-branch.go1.21] runtime: mark traceEnabled and traceShuttingDown as no:split
25ec110e51 [release-branch.go1.21] cmd/compile: ensure empty blocks in write barriers are marked unpreemptible
6634ce2f41 [release-branch.go1.21] runtime: profiling on Darwin cannot use blocking reads
25c6dce188 [release-branch.go1.21] cmd/compile: make backingArrayPtrLen to return typecheck-ed nodes
4e34f2e81d [release-branch.go1.21] go/types, types2: don't panic during interface completion
d91843ff67 [release-branch.go1.21] go/types, types2: use correct parameter list when checking argument passing
7437db1085 [release-branch.go1.21] go/types, types2: use exact unification when comparing interface methods
ed527ecfb2 [release-branch.go1.21] cmd/api: rename api.go to main_test.go
b78e8cc145 [release-branch.go1.21] crypto/tls: add GODEBUG to control max RSA key size
3475e6af4c [release-branch.go1.21] cmd/go: fix missing case checking for empty slice
179821c9e1 [release-branch.go1.21] net/http: permit requests with invalid Host headers
9398951479 [release-branch.go1.21] cmd/distpack: include directory entries in tar files
75d8be5fb4 [release-branch.go1.21] cmd/go/internal/web: release the net token when an HTTP request fails due to CheckRedirect
1755d14559 [release-branch.go1.21] cmd/compile: fix missing init nodes for len(string([]byte)) optimization
c19c4c566c (tag: go1.21.0) [release-branch.go1.21] go1.21.0
So I compiled go
from sources from this commit: 06df3292a8 [release-branch.go1.21] runtime: avoid MADV_HUGEPAGE for heap memory
and compile my simple app using compiled go
.
Results:
No jitters and no interrupts were observed
After that I compiled go
from sources from one commit before: b120517ffd [release-branch.go1.21] go/types, types2: remove order dependency in inference involving channels
and compile my simple app using compiled go
.
Results:
Jitters and interrupts were observed
So we can assume that side effect of fixing https://github.com/golang/go/issues/62329 (original PR: https://github.com/golang/go/issues/61718 created by @dominikh) is that problems described in this PR were also fixed.
Here is commit: https://github.com/golang/go/commit/06df3292a88929cde5fb054ae8a84e26a625e603
And I'm still very curious what was the root cause of problem described in this PR.
And I'm still very curious what was the root cause of problem described in this PR.
It's a long story. https://go.dev/doc/gc-guide#Linux_transparent_huge_pages has some background, but from there I think most of the detail is all in https://github.com/golang/go/issues/61718. Going all the way back, this started with https://bugzilla.kernel.org/show_bug.cgi?id=93111, which is why the runtime (for releases prior to Go 1.21.0) was calling madvise(MADV_HUGEPAGE)
and madvise(MADV_NOHUGEPAGE)
at all.
Coming back to the current issue, @prattmic theorizes that forcing huge pages on and off via madvise(MADV_HUGEPAGE)
and madvise(MADV_NOHUGEPAGE)
results in TLB shootdowns down the line that end up causing other programs to stall when they access memory and take a very slow path through the kernel and hardware.
The TL;DR is that Go 1.21.1 doesn't make these calls anymore. There are a handful of cases where they're still used, but the maximum number of calls to these syscalls in any given program is constant and small.
@mknyszek Thank you very much for the detailed response! I have one more question.
For applications built with Go 1.21.1 MADV_COLLAPSE
works for Linux 6.1 and higher, right? So in my case (Linux 5.15), madvice
most likely doesn't do anything and returns some error, which is intentionally not handled in the Go runtime. Am I correct?
I'm wondering (and I don't have a way to check it right now) what impact the correct execution of the madvise(v, n, _MADV_COLLAPSE)
operation for Linux 6.1+ may have on my workloads, their latency, and interrupts.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Details below.
What did you expect to see?
I expect to see no interrupts on isolated cores and want to run Go-based applications on environments when lowlatency workloads are executed.
What did you see instead?
Interrupts on isolated cores on which low-latency work is performed. Interrupts are caused by simple Go application (printing "Hello world" every 1 second) which runs on different, non-isolated cores.
Summary
LOC, IWI, RES and CAL interrupts are observed on isolated cores on which low-latency benchmark is performed. Interrupts are caused by simple Go application (printing "Hello world" every 1 second) which runs on different, non-isolated cores. Similar Python application doesn't cause such problems.
Tested on
Ubuntu 22.04.2 LTS (GNU/Linux 5.15.0-68-lowlatency x86_64)
(compiled with: "Full Dynticks System (tickless)" and "No Forced Preemption (Server)").Hardware:
2 x Intel(R) Xeon(R) Gold 6438N
(32 cores each)BIOS: Hyperthreading disabled
OS and configuration:
Ubuntu 22.04.2 LTS (GNU/Linux 5.15.0-68-lowlatency x86_64)
(compiled with: "Full Dynticks System (tickless)" and "No Forced Preemption (Server)" from https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/log/?h=lowlatency-next)irqbalance
stopped and disabled:Based on workload type, experiments and knowledge found in the Internet, following kernel parameters were used:
For every core/socket: Cx states (for x > 0) were disabled, particular power governor was used and fixed uncore values were set. To achieve that
power.py
script from https://github.com/intel/CommsPowerManagement was used.prepare_cpus.sh
script for setting this up and results![cpu_prepared](https://github.com/golang/go/assets/66442258/073a9af7-27f5-45f4-944b-d14f159ffdf5)
CPUs 20-23 are "isolated" (thanks to proper kernel parameters) - benchmark/workload will be run on them.
Kernel threads were moved from CPU20-23:
get_irqs.sh
script which checks which target CPUs are permitted for a given IRQ sourcesoutput of mentioned script:
lscpu
output:JITTER tool - Baseline
jitter is benchmarking tool which is meant for measuring the "jitter" in the execution time caused by OS and/or the underlying architecture.
Put "run_jitter.sh" script inside above directory.
Run:
Results:
Comment:
jitter
tool shows intervals and jitter in CPU Core cycles. Benchmark is done on 2000 MHz core so on graph values are divided by 2 and presented in nanoseconds. Very stable results, no significant jitters (max jitter: 51ns) during 335 seconds. No interruptions made on isolated CPU20 during benchmark.JITTER tool - Python
hello.py
- simple Python app which prints "Hello world" every 1 secondrun_python_hello.sh
- script to run python app on particular (non-isolated) coreIn first console
./run_python_hello.sh
was started, in second console./run_jitter.sh
was run.Results:
Comment: Acceptable result, one noticeable jitter (1190ns), the remaining jitters did not exceed 60ns during 336 seconds. No interruptions made on isolated CPU20 during benchmark.
JITTER tool - Golang
hello.go
- simple Golang app which prints "Hello world" every 1 secondgo.mod
- go module definitionrun_go_hello.sh
- script to run Go app on particular (non-isolated) coreIn first console Go app was built:
go build
and started:./run_go_hello.sh
, in second console./run_jitter.sh
was run.Results:
Comment: 34 significant jitters (the worst had: 44961ns) during 335 seconds. Following interruptions were made on isolated CPU20 during benchmark: LOC: 67 IWI: 34 RES: 34 RES: 34
It seems that every jitter is made every ~10s.
What is also interesting that for idle and isolated CPU22 and CPU23 no interruptions were made during benchmark. For CPU24 (not isolated) only LOC were made (335283 of them).
Notes:
cpuset
and its shield turned on. Unfortunately, results were even worse (jitters were "bigger" and more interruptions were made to shielded cores), moreovercset
was not able to move kernel threads outside of shielded pool.GNU/Linux 5.15.65-rt49 x86_64
-> https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/linux-5.15.65.tar.gz patched with https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.15/older/patch-5.15.65-rt49.patch.gz) and problem with interrupts and jitters done by Go app doesn't exist there. However, RT kernel is not the best solution for everyone and it would be great to not have jitters also on lowlatency tickless kernel.1.19.x
and1.20.2
.