x/build: openbsd-amd64-64 trybots are too slow

bradfitz commented 5 years ago

openbsd-amd64-64 trybots are taking 11+ minutes (which causes TryBots as a whole to take 11+ minutes rather than ~5)

We need to figure out what's slow on them, and/or just shard it out more.

/cc @dmitshur @bcmills @andybons

bradfitz commented 5 years ago

SELECT Builder, AVG(Seconds) as Sec FROM builds.Builds WHERE IsTry=True AND StartTime > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 100 HOUR) and Repo = "go" AND FailureURL = "" GROUP BY 1 ORDER BY Sec DESC;


Row | Builder | Sec |  
-- | -- | -- | --
1 | openbsd-amd64-64 | 579.7221071917958 |  
2 | linux-amd64-race | 488.40762166018374 |  
3 | nacl-386 | 434.8139809606734 |  
4 | windows-amd64-2016 | 424.604860819551 |  
5 | nacl-amd64p32 | 418.4696299015715 |  
6 | windows-386-2008 | 414.7469431190204 |  
7 | js-wasm | 371.9747404238125 |  
8 | misc-vet-vetall | 358.80661270393875 |  
9 | linux-386 | 353.81094730244905 |  
10 | linux-amd64 | 345.036077108898 |  
11 | misc-compile | 337.44598333253055 |  
12 | misc-compile-mips | 335.70810570520416 |  
13 | freebsd-amd64-12_0 | 328.52744295724483 |  
14 | misc-compile-openbsd | 293.41003601271416 |  
15 | misc-compile-netbsd | 292.8116776015307 |  
16 | misc-compile-freebsd | 292.80485985481636 |  
17 | misc-compile-nacl | 288.17948818259185 |  
18 | misc-compile-plan9 | 273.5849724516735 |  
19 | misc-compile-ppc | 251.7265086680816

bradfitz commented 5 years ago

SELECT Builder, Event, AVG(Seconds) as Sec FROM builds.Spans WHERE Builder LIKE 'openbsd-amd64%' AND Error='' And IsTry=True AND StartTime > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 100 HOUR) and Repo = "go" GROUP BY 1, 2 ORDER BY Sec DESC;

Row | Builder | Event | Sec |  
-- | -- | -- | -- | --
1 | openbsd-amd64-64 | make_and_test | 534.0490917608572 |  
2 | openbsd-amd64-64 | make | 292.82514376291306 |  
3 | openbsd-amd64-64 | run_test:go_test:cmd/go | 150.02306724040426 |  
4 | openbsd-amd64-64 | run_test:cgo_test | 88.3467336194681 |  
5 | openbsd-amd64-64 | run_test:runtime:cpu124 | 78.4045334334468 |  
6 | openbsd-amd64-64 | run_test:go_test:cmd/compile/internal/gc | 63.46040540948936 |  
7 | openbsd-amd64-64 | run_test:go_test:net | 62.921281942893614 |  
8 | openbsd-amd64-64 | run_test:go_test:runtime | 59.65075333531915 |  
9 | openbsd-amd64-64 | run_test:cgo_errors | 39.74668672670213 |  
10 | openbsd-amd64-64 | get_helper | 36.28992712104489 |  
11 | openbsd-amd64-64 | create_gce_buildlet | 35.651753116401366 |  
12 | openbsd-amd64-64 | get_buildlet | 33.989285913102044 |  
13 | openbsd-amd64-64 | run_test:go_test:cmd/compile/internal/ssa | 31.60199228663415 |  
14 | openbsd-amd64-64 | create_gce_instance | 30.614607733554422 |  
15 | openbsd-amd64-64 | run_test:go_test:cmd/vet | 25.966327777999997 |  
16 | openbsd-amd64-64 | run_test:go_test:net/http | 23.022429393063828 |  
17 | openbsd-amd64-64 | write_snapshot_to_gcs | 19.260539691499996 |  
18 | openbsd-amd64-64 | run_test:doc_progs | 16.33576500525532 |  
19 | openbsd-amd64-64 | run_test:go_test:runtime/pprof | 13.31036519180851 |  
20 | openbsd-amd64-64 | run_test:go_test:reflect | 11.999205834765958 |  
21 | openbsd-amd64-64 | run_test:go_test:time | 11.741334522617022 |  
22 | openbsd-amd64-64 | run_test:sync_cpu | 11.533254003170732 |  
23 | openbsd-amd64-64 | write_snapshot_tar | 10.8773563 |  
24 | openbsd-amd64-64 | run_test:go_test:cmd/compile | 10.752162821125 |  
25 | openbsd-amd64-64 | run_test:go_test:cmd/fix | 10.435803139355556 |  
26 | openbsd-amd64-64 | run_tests_multi | 10.243288797263473 |  
27 | openbsd-amd64-64 | run_test:go_test:strings | 10.110943746 |  
28 | openbsd-amd64-64 | run_test:go_test:cmd/link | 10.061246428116279 |  
29 | openbsd-amd64-64 | run_test:go_test:cmd/link/internal/ld | 10.037645228209302 |  
30 | openbsd-amd64-64 | run_test:go_test:go/types | 9.813017223727273 |  
31 | openbsd-amd64-64 | run_test:go_test:syscall | 9.579807829382979 |  
32 | openbsd-amd64-64 | run_test:go_test:strconv | 9.096669728574469 |  
33 | openbsd-amd64-64 | run_test:nolibgcc:net | 8.353404652658536 |  
34 | openbsd-amd64-64 | run_test:go_test:os/signal | 8.149411695148936 |  
35 | openbsd-amd64-64 | run_test:go_test:os | 8.032081740425532 |  
36 | openbsd-amd64-64 | run_test:go_test:math | 7.8759232913157895 |  
37 | openbsd-amd64-64 | run_test:go_test:net/http/httptrace | 7.7749100352500005 |  
38 | openbsd-amd64-64 | run_test:go_test:math/big | 7.75858940580851 |  
39 | openbsd-amd64-64 | run_test:go_test:cmd/internal/obj/x86 | 7.6362746259500005 |  
40 | openbsd-amd64-64 | get_source_from_gitmirror | 7.515116951666666 |  
41 | openbsd-amd64-64 | get_source | 7.277790430666666 |  
42 | openbsd-amd64-64 | run_test:bench_go1 | 7.0304439564893615 |  
43 | openbsd-amd64-64 | run_test:moved_goroot | 6.851026539365853 |  
44 | openbsd-amd64-64 | run_test:go_test:cmd/nm | 6.5756059088085115 |  
45 | openbsd-amd64-64 | run_test:go_test:cmd/cover | 6.451060486723404 |  
46 | openbsd-amd64-64 | run_test:go_test:cmd/objdump | 6.444223596553191 |  
47 | openbsd-amd64-64 | run_test:go_test:runtime/trace | 6.383058027941177 |  
48 | openbsd-amd64-64 | run_test:go_test:testing | 5.998117573319149 |  
49 | openbsd-amd64-64 | run_test:go_test:cmd/vendor/github.com/google/pprof/internal/driver | 5.980447624906977 |  
50 | openbsd-amd64-64 | run_test:wiki | 5.823946847042554

bradfitz commented 5 years ago

Wow, just running make.bash (which isn't sharded out over N buildlets) is more than twice as slow as other platforms:

SELECT Builder, Event, AVG(Seconds) as Sec FROM builds.Spans WHERE Event = 'make' AND Error='' And IsTry=True AND StartTime > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 100 HOUR) and Repo = "go" GROUP BY 1, 2 ORDER BY Sec DESC;

Row | Builder | Event | Sec |  
-- | -- | -- | -- | --
1 | openbsd-amd64-64 | make | 292.82514376291306 |  
2 | nacl-386 | make | 176.9535785543913 |  
3 | nacl-amd64p32 | make | 169.24032677876087 |  
4 | windows-386-2008 | make | 158.65642536708697 |  
5 | windows-amd64-2016 | make | 142.23586712976086 |  
6 | js-wasm | make | 137.46539279367394 |  
7 | linux-386 | make | 134.50720768395655 |  
8 | freebsd-amd64-12_0 | make | 124.52324519041304 |  
9 | misc-vet-vetall | make | 124.14415335852175 |  
10 | linux-amd64-race | make | 123.95929911093478 |  
11 | linux-amd64 | make | 123.54718755441306

bradfitz commented 5 years ago

Likely suspect: #18314 (use a tmpfs on OpenBSD)

bradfitz commented 5 years ago

I tried doing the memory filesystem on /tmp/ in an OpenBSD 6.4 amd64 instance (via gomote ssh) and it works, but it's still not any faster.

Still 5 minutes ....

bradfitz@gdev:~/src/golang.org/x/build$ time gomote run user-bradfitz-openbsd-amd64-64-0 go/src/make.bash
Building Go cmd/dist using /tmp/workdir/go1.4.
Building Go toolchain1 using /tmp/workdir/go1.4.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Building packages and commands for openbsd/amd64.
---
Installed Go for openbsd/amd64 in /tmp/workdir/go
Installed commands in /tmp/workdir/go/bin

real    5m3.824s
user    0m0.136s
sys     0m0.024s
bradfitz@gdev:~/src/golang.org/x/build$ time gomote run -system user-bradfitz-openbsd-amd64-64-0 mount
/dev/sd0a on / type ffs (local, wxallowed)
mfs:85198 on /tmp type mfs (asynchronous, local, nodev, nosuid, size=2097152 512-blocks)

real    0m0.108s
user    0m0.064s
sys     0m0.044s
bradfitz@gdev:~/src/golang.org/x/build$ time gomote run -system user-bradfitz-openbsd-amd64-64-0 df
Filesystem  512-blocks      Used     Avail Capacity  Mounted on
/dev/sd0a     18153212   1652976  15592576    10%    /
mfs:85198      2057756   1656516    298356    85%    /tmp

real    0m0.107s
user    0m0.096s
sys     0m0.012s

It sees 4 cores:

buildlet$ sysctl hw.ncpufound
hw.ncpufound=4

buildlet$ sysctl -a | grep cpu  
kern.ccpu=1948
hw.ncpu=4
hw.cpuspeed=2500
hw.ncpufound=4
hw.ncpuonline=4
machdep.cpuvendor=GenuineIntel
machdep.cpuid=0x306e4
machdep.cpufeature=0x1f9bfbff

buildlet$ dmesg | grep -i core
cpu0: smt 0, core 0, package 0
cpu1: smt 0, core 1, package 0
cpu2: smt 1, core 0, package 0
cpu3: smt 1, core 1, package 0

The kernel we're running is:

OpenBSD 6.4 (GENERIC.MP) #364: Thu Oct 11 13:30:23 MDT 2018
    deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Is this Spectre/Meldown mitigations shutting down SMT? Can we disable that for the builders?

/cc @mdempsky

mdempsky commented 5 years ago

@bradfitz I think you can try setting "sysctl hw.smt=1" to re-enable hyper threading.

https://man.openbsd.org/sysctl.2#HW_SMT_2

bradfitz commented 5 years ago

It's already enabled:

$ sysctl -a | grep -i smt                                                                                                                                                                    
hw.smt=1

So, that's not it. It's crazy that OpenBSD is 2x slower. If it were 10% slower I'd assume, "Oh, OpenBSD prioritizes security over performance" and be fine with that. But 2x makes me think we have a configuration problem somewhere.

stmuk commented 5 years ago

Have you tried increasing login.conf limits (as I suggested on twitter)?

bradfitz commented 5 years ago

Which would you increase? We have:

default:\
        :path=/usr/bin /bin /usr/sbin /sbin /usr/X11R6/bin /usr/local/bin /usr/local/sbin:\
        :umask=022:\
        :datasize-max=768M:\
        :datasize-cur=768M:\
        :maxproc-max=256:\
        :maxproc-cur=128:\
        :openfiles-max=1024:\
        :openfiles-cur=512:\
        :stacksize-cur=4M:\
        :localcipher=blowfish,a:\
        :tc=auth-defaults:\
        :tc=auth-ftp-defaults:

stmuk commented 5 years ago

The default settings are low. You could try setting datasize-max/cur and stacksize-cur to "unlimited"

mdempsky commented 5 years ago

@stmuk Wouldn't the resource limits being too low just cause the build to fail rather than to proceed slowly?

bradfitz commented 5 years ago

Yeah. The issue is speed, not failure to build.

stmuk commented 5 years ago

@mdempsky Maybe but it's easy enough to try. @bradfitz How does your build work? Do you build bootstrap 1.4 with OpenBSD clang and then compile go? If so do you see the slow down with both steps?

bradfitz commented 5 years ago

@mdempsky Maybe but it's easy enough to try.

This is all very tedious & slow to work on, so I don't eagerly pursue avenues that don't at least make sense. Maybe if I were really desperate. But given limited time, I'd rather spend it on trying to collect system-wide profiling information or otherwise getting more visibility into the problem, rather than just changing random things.

How does your build work? Do you build bootstrap 1.4 with OpenBSD clang and then compile go? If so do you see the slow down with both steps?

We push a pre-built Go 1.4 to it and use that.

mdempsky commented 5 years ago

@bradfitz Maybe a first step would be to use cmd/dist's GOBUILDTIMELOGFILE to see if any particular steps are slower, or the whole thing is proportionally slower?

$ GOBUILDTIMELOGFILE=/tmp/buildtime.txt ./make.bash
Building Go cmd/dist using /usr/lib/google-golang.
Building Go toolchain1 using /usr/lib/google-golang.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Building packages and commands for linux/amd64.
---
Installed Go for linux/amd64 in /usr/local/google/home/mdempsky/wd/go
Installed commands in /usr/local/google/home/mdempsky/wd/go/bin
$ cat /tmp/buildtime.txt
Fri Jan 18 13:37:01 PST 2019 start make.bash
Fri Jan 18 13:37:03 PST 2019 +2.2s start dist bootstrap
Fri Jan 18 13:37:03 PST 2019 +2.6s build toolchain1
Fri Jan 18 13:37:18 PST 2019 +18.0s build go_bootstrap
Fri Jan 18 13:37:28 PST 2019 +27.9s build toolchain2
Fri Jan 18 13:37:45 PST 2019 +44.1s build toolchain3
Fri Jan 18 13:38:00 PST 2019 +59.9s build toolchain
Fri Jan 18 13:38:11 PST 2019 +70.3s end dist bootstrap

stmuk commented 5 years ago

@bradfitz Too many negatives in that for me to parse that or motivate me to try and help further. I just regret wasting my time trying to help.

bradfitz commented 5 years ago

@stmuk, sorry, I didn't mean to waste your time. But with me and @mdempsky both thinking that such a tweak wouldn't do anything, it's not a high priority of mine to try. I appreciate you throwing it out there, even if it's not the answer. I at least went and read the OpenBSD man pages for those knobs.

stmuk commented 5 years ago

@bradfitz You were right the login cap limit relaxation made no difference whatever.

@mdempsky Running on i5 Gen 5 Vbox host with OEL7.6 and OpenBSD 6.4 guests under vagrant I get the unexpected result of a slightly faster OpenBSD build!

There are different compilers in use to build the 1.4 I bootstrapped tip off. OpenBSD has their patched clang 6 whereas Linux has gcc 4.8.5. OBSD has a noatime mount but otherwise no changes were made.

I'm wondering if we are just seeing differences due to the underlying virtualisation. I may experiment with QEMU and more similar C compilers if I get a chance.

go version devel +5538a9a Fri Jan 18 22:41:47 2019 +0000 linux/amd64 real 3m10.367s user 2m41.822s sys 0m14.216s Sat Jan 19 12:37:19 UTC 2019 start make.bash Sat Jan 19 12:37:20 UTC 2019 +1.4s start dist bootstrap Sat Jan 19 12:37:20 UTC 2019 +1.4s build toolchain1 Sat Jan 19 12:37:38 UTC 2019 +19.0s build go_bootstrap Sat Jan 19 12:37:57 UTC 2019 +38.2s build toolchain2 Sat Jan 19 12:38:57 UTC 2019 +98.2s build toolchain3 Sat Jan 19 12:39:48 UTC 2019 +149.1s build toolchain Sat Jan 19 12:40:29 UTC 2019 +190.6s end dist bootstrap

go version devel +5538a9a Fri Jan 18 22:41:47 2019 +0000 openbsd/amd64 real 2m41.425s user 1m55.670s sys 2m0.150s Sat Jan 19 04:51:44 PST 2019 start make.bash Sat Jan 19 04:51:46 PST 2019 +2.2s start dist bootstrap Sat Jan 19 04:51:46 PST 2019 +2.3s build toolchain1 Sat Jan 19 04:52:07 PST 2019 +23.6s build go_bootstrap Sat Jan 19 04:52:25 PST 2019 +41.4s build toolchain2 Sat Jan 19 04:53:14 PST 2019 +90.0s build toolchain3 Sat Jan 19 04:53:51 PST 2019 +127.2s build toolchain Sat Jan 19 04:54:26 PST 2019 +162.1s end dist bootstrap

juanfra684 commented 5 years ago

The problem is not the compiler or the VM software or the FS. I'm the maintainer of BaCon which also runs a big bash script and it's slow as hell. Something happens between bash and the OpenBSD base which makes the bash scripts slow. Maybe something related to the memory protections.

bradfitz commented 5 years ago

@juanfra684, our bash script is a few lines that just calls into a Go program that does the rest. Our performance issue is not bash related.

juanfra684 commented 5 years ago

You're right, sorry for the misunderstanding.

I've built the go (only make, no tests) port on 6.4 and -current, and there is a -14% of difference:

6.4
real       205.50
user       303.09
sys        138.58

-current
real       186.69
user       300.99
sys         73.94

~~Recently the malloc code was changed to work better with multithreaded programs.~~ (I reverted the malloc changes and the result is the same.)

OpenBSD doesn't have magic knobs to speedup things but you could tune a few thing to help the bootstrap. Firstly, if the VM host is using flash drives for storage, forget mfs. It's not an equivalent in speed to Linux tmpfs and you can usually run the FS operations faster in a simple consumer grade SSD.

About the mount options, use noatime, softdep. Linux is using relatime and a log backed FS by default, so the comparison with a plain OpenBSD installation is not fair.

You could add also a few entries to /etc/sysctl.conf:

kern.pool_debug=0: 0 is the default for stable versions but you could forget it if you're comparing the performance of stable with current.
kern.bufcachepercent=80: the default is 20 and the maximum is 80. The correct percentage depends of how much RAM the system has.
kern.maxvnodes: check what value the system has and increase it. The default value is very very conservative. It's the limit of vnodes which the system can maintain in memory.

bradfitz commented 5 years ago

Firstly, if the VM host is using flash drives for storage, forget mfs. It's not an equivalent in speed to Linux tmpfs and you can usually run the FS operations faster in a simple consumer grade SSD.

Sounds good. Yes, we're using GCE's SSDs, which are fast.

About the mount options, use noatime, softdep. Linux is using relatime and a log backed FS by default, so the comparison with a plain OpenBSD installation is not fair.

We already had noatime, but not softdep.

kern.bufcachepercent=80: the default is 20 and the maximum is 80. The

We had 20.

kern.maxvnodes: check what value the system has and increase it. The default value is very very conservative. It's the limit of vnodes which the system can maintain in memory.

We had kern.maxvnodes=37669 but during a build I see it only go up to kern.numvnodes=25417 so it doesn't appear we're running into that limit.

juanfra684 commented 5 years ago

About the mount options, use noatime, softdep. Linux is using relatime and a log backed FS by default, so the comparison with a plain OpenBSD installation is not fair.

We already had noatime, but not softdep.

One of your commands shows: /dev/sd0a on / type ffs (local, wxallowed). That's why I suggested noatime.

kern.bufcachepercent=80: the default is 20 and the maximum is 80. The

We had 20.

If you have a few GB of RAM, try with 80. If the system crashes, select something lower. The reason why OpenBSD is not using 80 by default is because there is a bug somewhere which crashes the system when the value is high. Anyway, the most of users will never have that crash.

kern.maxvnodes: check what value the system has and increase it. The default value is very very conservative. It's the limit of vnodes which the system can maintain in memory.

We had kern.maxvnodes=37669 but during a build I see it only go up to kern.numvnodes=25417 so it doesn't appear we're running into that limit.

A better way to check the correct value for maxvnodes is to use systat -s1 during the build:

Namei         Sys-cache    Proc-cache    No-cache
    Calls     hits    %    hits     %    miss   %
      185      131   71      25    14      29  16

No-cache will tell you if you have enough vnodes or not. Probably kern.maxvnodes=100000 is a safe value for your system.

Anyway, those things will only speedup slightly the process. There is an underlying problem on OpenBSD slowing down the build and tests.

bradfitz commented 5 years ago

One of your commands shows: /dev/sd0a on / type ffs (local, wxallowed). That's why I suggested noatime.

We remount it when the build daemon starts up:

https://github.com/golang/build/blob/c61dc90b4c2e3fd9aff9ce059a985cab89ef9e1b/cmd/buildlet/buildlet.go#L1685

That runs before any build.

juanfra684 commented 5 years ago

async is faster than softdep and you can't use both, so forget my suggestion about that.

stmuk commented 5 years ago

go version devel +5538a9a Fri Jan 18 22:41:47 2019 +0000 linux/amd64 real 3m10.367s [...] sys 0m14.216s

go version devel +5538a9a Fri Jan 18 22:41:47 2019 +0000 openbsd/amd64 real 2m41.425s [...] sys 2m0.150s

Despite the odd wallclock result. OpenBSD does seem to be spending nearly 9 times as much time in system calls!

bradfitz commented 5 years ago

I looked at our historical timing information for the builders for the cmd/compile/internal/gc tests. Turns out the median duration for OpenBSD 6.4 amd64 is 64.5 seconds but linux-amd64 is 20.2s and freebsd-amd64 is 19.6s.

So OpenBSD is 3x slower.

I tried comparing FreeBSD against FreeBSD modified to not use the vdso nanotime and didn't see a difference.

Profiling that test above, on OpenBSD I see:

(pprof) top 30
Showing nodes accounting for 890ms, 100% of 890ms total
Showing top 30 nodes out of 210
      flat  flat%   sum%        cum   cum%
     240ms 26.97% 26.97%      240ms 26.97%  runtime.nanotime
      90ms 10.11% 37.08%       90ms 10.11%  runtime.thrsleep
      80ms  8.99% 46.07%       80ms  8.99%  runtime.memclrNoHeapPointers
      80ms  8.99% 55.06%       80ms  8.99%  runtime.osyield
      50ms  5.62% 60.67%       60ms  6.74%  syscall.Syscall
      40ms  4.49% 65.17%       40ms  4.49%  runtime.kevent
      40ms  4.49% 69.66%       60ms  6.74%  runtime.scanobject
      20ms  2.25% 71.91%       20ms  2.25%  runtime.heapBitsSetType
      20ms  2.25% 74.16%      280ms 31.46%  runtime.mallocgc
      20ms  2.25% 76.40%       20ms  2.25%  runtime.memmove
      20ms  2.25% 78.65%       20ms  2.25%  runtime.nextFreeFast
      20ms  2.25% 80.90%       20ms  2.25%  runtime.thrwakeup
      20ms  2.25% 83.15%       20ms  2.25%  runtime.walltime
      10ms  1.12% 84.27%       30ms  3.37%  go/scanner.(*Scanner).Scan
      10ms  1.12% 85.39%       10ms  1.12%  internal/poll.(*FD).Write
      10ms  1.12% 86.52%       50ms  5.62%  reflect.Value.call
      10ms  1.12% 87.64%       10ms  1.12%  regexp.(*Regexp).tryBacktrack
      10ms  1.12% 88.76%       10ms  1.12%  runtime.(*gcSweepBuf).push
      10ms  1.12% 89.89%       10ms  1.12%  runtime.heapBits.next
      10ms  1.12% 91.01%       10ms  1.12%  runtime.jmpdefer
      10ms  1.12% 92.13%       10ms  1.12%  runtime.mapaccess2
      10ms  1.12% 93.26%       10ms  1.12%  runtime.newdefer
      10ms  1.12% 94.38%       40ms  4.49%  runtime.newobject
      10ms  1.12% 95.51%       10ms  1.12%  runtime.reentersyscall
      10ms  1.12% 96.63%       10ms  1.12%  runtime.scanblock
      10ms  1.12% 97.75%       10ms  1.12%  runtime.spanOf
      10ms  1.12% 98.88%      420ms 47.19%  runtime.systemstack
      10ms  1.12%   100%       10ms  1.12%  syscall.Syscall6
         0     0%   100%       20ms  2.25%  bufio.(*Scanner).Text
         0     0%   100%       10ms  1.12%  bytes.(*Buffer).Grow

On FreeBSD I see:

(pprof) top 30
Showing nodes accounting for 350ms, 100% of 350ms total
Showing top 30 nodes out of 163
      flat  flat%   sum%        cum   cum%
      70ms 20.00% 20.00%       70ms 20.00%  runtime.sys_umtx_op
      30ms  8.57% 28.57%       50ms 14.29%  runtime.scanobject
      20ms  5.71% 34.29%       60ms 17.14%  runtime.mallocgc
      20ms  5.71% 40.00%       20ms  5.71%  runtime.memclrNoHeapPointers
      10ms  2.86% 42.86%       20ms  5.71%  bufio.(*Scanner).Scan
      10ms  2.86% 45.71%       30ms  8.57%  cmd/compile/internal/gc.TestShiftGeneric
      10ms  2.86% 48.57%       10ms  2.86%  compress/flate.(*decompressor).huffmanBlock
      10ms  2.86% 51.43%       10ms  2.86%  debug/dwarf.(*Reader).Next
      10ms  2.86% 54.29%       10ms  2.86%  go/token.Token.Precedence
      10ms  2.86% 57.14%       10ms  2.86%  reflect.(*rtype).Kind
      10ms  2.86% 60.00%       10ms  2.86%  regexp.(*Regexp).tryBacktrack
      10ms  2.86% 62.86%       10ms  2.86%  regexp.(*bitState).reset
      10ms  2.86% 65.71%       10ms  2.86%  regexp.(*inputs).clear
      10ms  2.86% 68.57%       10ms  2.86%  runtime.(*mTreap).rotateRight
      10ms  2.86% 71.43%       10ms  2.86%  runtime.(*mspan).objIndex
      10ms  2.86% 74.29%       20ms  5.71%  runtime.(*mspan).sweep
      10ms  2.86% 77.14%       10ms  2.86%  runtime.growslice
      10ms  2.86% 80.00%       10ms  2.86%  runtime.heapBitsForAddr
      10ms  2.86% 82.86%       10ms  2.86%  runtime.kevent
      10ms  2.86% 85.71%       10ms  2.86%  runtime.memmove
      10ms  2.86% 88.57%       10ms  2.86%  runtime.pageIndexOf
      10ms  2.86% 91.43%       10ms  2.86%  runtime.sigprocmask
      10ms  2.86% 94.29%       30ms  8.57%  runtime.slicebytetostring
      10ms  2.86% 97.14%       10ms  2.86%  runtime.spanOfUnchecked
      10ms  2.86%   100%       10ms  2.86%  syscall.Syscall
         0     0%   100%       20ms  5.71%  bufio.(*Scanner).Text
         0     0%   100%       10ms  2.86%  bytes.(*Buffer).ReadFrom
         0     0%   100%       60ms 17.14%  cmd/compile/internal/gc.TestCode
         0     0%   100%       70ms 20.00%  cmd/compile/internal/gc.TestIntendedInlining
         0     0%   100%       10ms  2.86%  cmd/compile/internal/gc.TestIntendedInlining.func1

/cc @aclements

bradfitz commented 5 years ago

OpenBSD spends a lot of time in nanotime, thrsleep (via semasleep), so looking at those, I see:

TEXT runtime·nanotime(SB),NOSPLIT,$24
        MOVQ    CLOCK_MONOTONIC, DI     // arg 1 - clock_id                                                                                                                  
        LEAQ    8(SP), SI               // arg 2 - tp                                                                                                                        
        MOVL    $87, AX                 // sys_clock_gettime                                                                                                                 
        SYSCALL
        MOVQ    8(SP), AX               // sec                                                                                                                               
        MOVQ    16(SP), DX              // nsec                                                                                                                              

        // sec is in AX, nsec in DX                                                                                                                                          
        // return nsec in AX                                                                                                                                                 
        IMULQ   $1000000000, AX
        ADDQ    DX, AX
        MOVQ    AX, ret+0(FP)
        RET

//go:nosplit                                                                                                                                                                 
func semasleep(ns int64) int32 {
        _g_ := getg()

        // Compute sleep deadline.                                                                                                                                           
        var tsp *timespec
        if ns >= 0 {
                var ts timespec
                var nsec int32
                ns += nanotime()
                ts.set_sec(int64(timediv(ns, 1000000000, &nsec)))
                ts.set_nsec(nsec)
                tsp = &ts
        }

semasleep is calling nanotime which multiplies by 1000000000 just to then divide it by 1000000000?

Probably room for improvement there.

Edit: or not. OpenBSD's __thrsleep(2) says it returns EINVAL if the timespec's nanoseconds value is too large. So the timediv is probably inevitable.

mdempsky commented 5 years ago

Linux and FreeBSD's nanotime implementations have the same IMULQ $1e9, AX instructions, so I doubt that's why OpenBSD's is so slow.

mdempsky commented 5 years ago

@bradfitz clock_gettime is an unlocked syscall on OpenBSD 6.4, and it's pretty simple. I'm pretty surprised to see it so high on the table.

The main thing that controls its behavior is the setting of sysctl kern.timecounter. Can you show what that is on the builders?

Might be worth just pasting the entirety of dmesg and sysctl -a while you're at it.

juanfra684 commented 5 years ago

Does go use system's malloc here or it's something unrelated? 20ms 2.25% 74.16% 280ms 31.46% runtime.mallocgc.

Some time related syscalls were modified recently, maybe running the tests on a temporal -current installation would be a good idea.

ianlancetaylor commented 5 years ago

Go does not use the system malloc. It is completely unrelated.

mdempsky commented 5 years ago

Does go use system's malloc here or it's something unrelated?

No, runtime.mallocgc is the entry point to Go's garbage collected heap allocator, which is implemented on top of mmap on OpenBSD (and Go's other POSIX-ish OSes).

bradfitz commented 5 years ago

@mdempsky, oh right, somebody else had mentioned the time source as a likely problem (https://twitter.com/johansglock/status/1086569264935550977) and referenced this pvclock driver (http://openbsd-archive.7691.n7.nabble.com/pvclock-4-td355443.html)

It's using acpitimer0:

# sysctl kern.timecounter                                                                                                                                           
kern.timecounter.tick=1
kern.timecounter.timestepwarnings=0
kern.timecounter.hardware=acpitimer0
kern.timecounter.choice=i8254(0) tsc(-1000) acpitimer0(1000) dummy(-1000000)

dmesg

OpenBSD 6.4 (GENERIC.MP) #364: Thu Oct 11 13:30:23 MDT 2018
    deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 3848261632 (3669MB)
avail mem = 3722362880 (3549MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf2a70 (16 entries)
bios0: vendor Google version "Google" date 01/01/2011
bios0: Google Google Compute Engine
acpi0 at bios0: rev 0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP SSDT APIC WAET SRAT
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU @ 2.50GHz, 2500.61 MHz, 06-3e-04
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SSBD,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 970MHz
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Xeon(R) CPU @ 2.50GHz, 2424.94 MHz, 06-3e-04
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SSBD,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 1 (application processor)
cpu2: Intel(R) Xeon(R) CPU @ 2.50GHz, 2425.00 MHz, 06-3e-04
cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SSBD,ARAT,XSAVEOPT,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 1, core 0, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Xeon(R) CPU @ 2.50GHz, 2424.95 MHz, 06-3e-04
cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SSBD,ARAT,XSAVEOPT,MELTDOWN
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 0 pa 0xfec00000, version 11, 24 pins
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
acpicpu2 at acpi0: C1(@1 halt!)
acpicpu3 at acpi0: C1(@1 halt!)
"ACPI0006" at acpi0 not configured
acpicmos0 at acpi0
"QEMU0001" at acpi0 not configured
"ACPI0007" at acpi0 not configured
"ACPI0007" at acpi0 not configured
"ACPI0007" at acpi0 not configured
"ACPI0007" at acpi0 not configured
pvbus0 at mainbus0: KVM
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371AB PIIX4 ISA" rev 0x03
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: SMBus disabled
virtio0 at pci0 dev 3 function 0 "Qumranet Virtio SCSI" rev 0x00
vioscsi0 at virtio0: qsize 8192
scsibus1 at vioscsi0: 253 targets
sd0 at scsibus1 targ 1 lun 0:  SCSI4 0/direct fixed serial.Google_PersistentDisk_
sd0: 10240MB, 512 bytes/sector, 20971520 sectors, thin
virtio0: msix shared
virtio1 at pci0 dev 4 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio1: address 42:01:0a:f0:00:65
virtio1: msix per-VQ
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
com2 at isa0 port 0x3e8/8 irq 5: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0 mux 1
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (26b0025335694106.a) swap on sd0b dump on sd0b

# sysctl -a

kern.ostype=OpenBSD kern.osrelease=6.4 kern.osrevision=201811 kern.version=OpenBSD 6.4 (GENERIC.MP) #364: Thu Oct 11 13:30:23 MDT 2018 deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP kern.maxvnodes=6603 kern.maxproc=1310 kern.maxfiles=7030 kern.argmax=262144 kern.securelevel=1 kern.hostname=buildlet.my.domain kern.hostid=0 kern.clockrate=tick = 10000, tickadj = 40, hz = 100, profhz = 100, stathz = 100 kern.dnsjackport=0 kern.posix1version=200809 kern.ngroups=16 kern.job_control=1 kern.saved_ids=1 kern.boottime=Tue Jan 29 13:18:23 2019 kern.domainname= kern.maxpartitions=16 kern.rawpartition=2 kern.maxthread=1950 kern.nthreads=57 kern.osversion=GENERIC.MP#364 kern.somaxconn=128 kern.sominconn=80 kern.nosuidcoredump=1 kern.fsync=1 kern.sysvmsg=1 kern.sysvsem=1 kern.sysvshm=1 kern.msgbufsize=98256 kern.malloc.buckets=16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536,131072,262144,524288 kern.malloc.bucket.16=(calls = 1038 total_allocated = 512 total_free = 151 elements = 256 high watermark = 1280 could_free = 0) kern.malloc.bucket.32=(calls = 1634 total_allocated = 256 total_free = 72 elements = 128 high watermark = 640 could_free = 0) kern.malloc.bucket.64=(calls = 4388 total_allocated = 1728 total_free = 33 elements = 64 high watermark = 320 could_free = 0) kern.malloc.bucket.128=(calls = 16379 total_allocated = 6272 total_free = 36 elements = 32 high watermark = 160 could_free = 113) kern.malloc.bucket.256=(calls = 2890 total_allocated = 96 total_free = 21 elements = 16 high watermark = 80 could_free = 0) kern.malloc.bucket.512=(calls = 538 total_allocated = 96 total_free = 2 elements = 8 high watermark = 40 could_free = 0) kern.malloc.bucket.1024=(calls = 5248 total_allocated = 4688 total_free = 5 elements = 4 high watermark = 20 could_free = 0) kern.malloc.bucket.2048=(calls = 44 total_allocated = 22 total_free = 1 elements = 2 high watermark = 10 could_free = 0) kern.malloc.bucket.4096=(calls = 576 total_allocated = 15 total_free = 1 elements = 1 high watermark = 5 could_free = 0) kern.malloc.bucket.8192=(calls = 345 total_allocated = 211 total_free = 2 elements = 1 high watermark = 5 could_free = 0) kern.malloc.bucket.16384=(calls = 196 total_allocated = 3 total_free = 0 elements = 1 high watermark = 5 could_free = 0) kern.malloc.bucket.32768=(calls = 10 total_allocated = 9 total_free = 0 elements = 1 high watermark = 5 could_free = 0) kern.malloc.bucket.65536=(calls = 183 total_allocated = 4 total_free = 0 elements = 1 high watermark = 5 could_free = 0) kern.malloc.bucket.131072=(calls = 2 total_allocated = 2 total_free = 0 elements = 1 high watermark = 5 could_free = 0) kern.malloc.bucket.262144=(calls = 4 total_allocated = 4 total_free = 0 elements = 1 high watermark = 5 could_free = 0) kern.malloc.bucket.524288=(calls = 1 total_allocated = 1 total_free = 0 elements = 1 high watermark = 5 could_free = 0) kern.malloc.kmemnames=free,,devbuf,,pcb,rtable,,,,ifaddr,soopts,sysctl,counters,,ioctlops,,,,,iov,mount,,NFS_req,NFS_mount,,vnodes,namecache,UFS_quota,UFS_mount,shm,VM_map,sem,dirhash,ACPI,VM_pmap,,,,file,file_desc,,proc,subproc,VFS_cluster,,,MFS_node,,,Export_Host,NFS_srvsock,,NFS_daemon,ip_moptions,in_multi,ether_multi,mrt,ISOFS_mount,ISOFS_node,MSDOSFS_mount,MSDOSFS_fat,MSDOSFS_node,ttys,exec,miscfs_mount,fusefs_mount,,,,,,,,,pfkey_data,tdb,xform_data,,pagedep,inodedep,newblk,,,indirdep,,,,,,,,,VM_swap,,,,,,UVM_amap,UVM_aobj,,USB,USB_device,USB_HC,,memdesc,,,crypto_data,,IPsec_creds,,,,emuldata,,,,,,,,,ip6_options,NDP,,,temp,NTFS_mount,NTFS_node,NTFS_fnode,NTFS_dir,NTFS_hash,NTFS_attr,NTFS_data,NTFS_decomp,NTFS_vrun,kqueue,,SYN_cache,UDF_mount,UDF_file_entry,UDF_file_id,,AGP_Memory,DRM kern.malloc.kmemstat.free=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.devbuf=(inuse = 9526, calls = 10617, memuse = 6381K, limblocks = 0, mapblocks = 0, maxused = 6382K, limit = 78644K, spare = 0, sizes = (16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536,131072,262144)) kern.malloc.kmemstat.pcb=(inuse = 78, calls = 114, memuse = 17K, limblocks = 0, mapblocks = 0, maxused = 18K, limit = 78644K, spare = 0, sizes = (16,32,128,1024)) kern.malloc.kmemstat.rtable=(inuse = 61, calls = 144, memuse = 2K, limblocks = 0, mapblocks = 0, maxused = 3K, limit = 78644K, spare = 0, sizes = (16,32,64,128,256)) kern.malloc.kmemstat.ifaddr=(inuse = 25, calls = 26, memuse = 8K, limblocks = 0, mapblocks = 0, maxused = 8K, limit = 78644K, spare = 0, sizes = (32,64,128,256,4096)) kern.malloc.kmemstat.soopts=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.sysctl=(inuse = 3, calls = 3, memuse = 2K, limblocks = 0, mapblocks = 0, maxused = 2K, limit = 78644K, spare = 0, sizes = (32,128,1024)) kern.malloc.kmemstat.counters=(inuse = 79, calls = 79, memuse = 67K, limblocks = 0, mapblocks = 0, maxused = 67K, limit = 78644K, spare = 0, sizes = (64,128,256,512,1024,4096,8192)) kern.malloc.kmemstat.ioctlops=(inuse = 0, calls = 1469, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 4K, limit = 78644K, spare = 0, sizes = (256,512,1024,2048,4096)) kern.malloc.kmemstat.iov=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.mount=(inuse = 1, calls = 1, memuse = 1K, limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0, sizes = (1024)) kern.malloc.kmemstat.NFS_req=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.NFS_mount=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.vnodes=(inuse = 1169, calls = 1172, memuse = 74K, limblocks = 0, mapblocks = 0, maxused = 74K, limit = 78644K, spare = 0, sizes = (64,128,256)) kern.malloc.kmemstat.namecache=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.UFS_quota=(inuse = 1, calls = 1, memuse = 32K, limblocks = 0, mapblocks = 0, maxused = 32K, limit = 78644K, spare = 0, sizes = (32768)) kern.malloc.kmemstat.UFS_mount=(inuse = 5, calls = 5, memuse = 37K, limblocks = 0, mapblocks = 0, maxused = 37K, limit = 78644K, spare = 0, sizes = (64,512,2048,32768)) kern.malloc.kmemstat.shm=(inuse = 2, calls = 2, memuse = 2K, limblocks = 0, mapblocks = 0, maxused = 2K, limit = 78644K, spare = 0, sizes = (256,1024)) kern.malloc.kmemstat.VM_map=(inuse = 2, calls = 2, memuse = 1K, limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0, sizes = (256)) kern.malloc.kmemstat.sem=(inuse = 2, calls = 2, memuse = 1K, limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0, sizes = (32,128)) kern.malloc.kmemstat.dirhash=(inuse = 18, calls = 45, memuse = 4K, limblocks = 0, mapblocks = 0, maxused = 4K, limit = 78644K, spare = 0, sizes = (16,32,64,128,512)) kern.malloc.kmemstat.ACPI=(inuse = 1822, calls = 12774, memuse = 199K, limblocks = 0, mapblocks = 0, maxused = 292K, limit = 78644K, spare = 0, sizes = (16,32,64,128,2048)) kern.malloc.kmemstat.VM_pmap=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.file=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.file_desc=(inuse = 1, calls = 2, memuse = 1K, limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0, sizes = (512)) kern.malloc.kmemstat.proc=(inuse = 61, calls = 335, memuse = 48K, limblocks = 0, mapblocks = 0, maxused = 72K, limit = 78644K, spare = 0, sizes = (16,64,1024,4096,8192)) kern.malloc.kmemstat.subproc=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.VFS_cluster=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.MFS_node=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.Export_Host=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.NFS_srvsock=(inuse = 1, calls = 1, memuse = 1K, limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0, sizes = (128)) kern.malloc.kmemstat.NFS_daemon=(inuse = 1, calls = 1, memuse = 16K, limblocks = 0, mapblocks = 0, maxused = 16K, limit = 78644K, spare = 0, sizes = (16384)) kern.malloc.kmemstat.ip_moptions=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.in_multi=(inuse = 11, calls = 11, memuse = 1K, limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0, sizes = (32,64,128)) kern.malloc.kmemstat.ether_multi=(inuse = 1, calls = 1, memuse = 1K, limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0, sizes = (32)) kern.malloc.kmemstat.mrt=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.ISOFS_mount=(inuse = 1, calls = 1, memuse = 32K, limblocks = 0, mapblocks = 0, maxused = 32K, limit = 78644K, spare = 0, sizes = (32768)) kern.malloc.kmemstat.ISOFS_node=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.MSDOSFS_mount=(inuse = 1, calls = 1, memuse = 16K, limblocks = 0, mapblocks = 0, maxused = 16K, limit = 78644K, spare = 0, sizes = (16384)) kern.malloc.kmemstat.MSDOSFS_fat=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.MSDOSFS_node=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.ttys=(inuse = 390, calls = 390, memuse = 1723K, limblocks = 0, mapblocks = 0, maxused = 1723K, limit = 78644K, spare = 0, sizes = (512,1024,8192)) kern.malloc.kmemstat.exec=(inuse = 0, calls = 300, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 2K, limit = 78644K, spare = 0, sizes = (16,32,256,1024)) kern.malloc.kmemstat.miscfs_mount=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.fusefs_mount=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.pfkey_data=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.tdb=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.xform_data=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.pagedep=(inuse = 1, calls = 1, memuse = 8K, limblocks = 0, mapblocks = 0, maxused = 8K, limit = 78644K, spare = 0, sizes = (8192)) kern.malloc.kmemstat.inodedep=(inuse = 1, calls = 1, memuse = 32K, limblocks = 0, mapblocks = 0, maxused = 32K, limit = 78644K, spare = 0, sizes = (32768)) kern.malloc.kmemstat.newblk=(inuse = 1, calls = 1, memuse = 1K, limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0, sizes = (512)) kern.malloc.kmemstat.indirdep=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.VM_swap=(inuse = 7, calls = 7, memuse = 75K, limblocks = 0, mapblocks = 0, maxused = 75K, limit = 78644K, spare = 0, sizes = (16,64,2048,32768,65536)) kern.malloc.kmemstat.UVM_amap=(inuse = 257, calls = 1720, memuse = 21K, limblocks = 0, mapblocks = 0, maxused = 21K, limit = 78644K, spare = 0, sizes = (16,32,64,128,256,8192)) kern.malloc.kmemstat.UVM_aobj=(inuse = 2, calls = 2, memuse = 3K, limblocks = 0, mapblocks = 0, maxused = 3K, limit = 78644K, spare = 0, sizes = (16,2048)) kern.malloc.kmemstat.USB=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.USB_device=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.USB_HC=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.memdesc=(inuse = 1, calls = 1, memuse = 4K, limblocks = 0, mapblocks = 0, maxused = 4K, limit = 78644K, spare = 0, sizes = (4096)) kern.malloc.kmemstat.crypto_data=(inuse = 1, calls = 1, memuse = 1K, limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0, sizes = (1024)) kern.malloc.kmemstat.IPsec_creds=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.emuldata=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.ip6_options=(inuse = 1, calls = 1, memuse = 1K, limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0, sizes = (128)) kern.malloc.kmemstat.NDP=(inuse = 4, calls = 4, memuse = 1K, limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0, sizes = (32)) kern.malloc.kmemstat.temp=(inuse = 55, calls = 4236, memuse = 2321K, limblocks = 0, mapblocks = 0, maxused = 2447K, limit = 78644K, spare = 0, sizes = (16,32,64,128,512,1024,2048,4096,8192,16384,65536,524288)) kern.malloc.kmemstat.NTFS_mount=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.NTFS_node=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.NTFS_fnode=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.NTFS_dir=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.NTFS_hash=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.NTFS_attr=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.NTFS_data=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.NTFS_decomp=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.NTFS_vrun=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.kqueue=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.SYN_cache=(inuse = 2, calls = 2, memuse = 16K, limblocks = 0, mapblocks = 0, maxused = 16K, limit = 78644K, spare = 0, sizes = (8192)) kern.malloc.kmemstat.UDF_mount=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.UDF_file_entry=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.UDF_file_id=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.AGP_Memory=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.malloc.kmemstat.DRM=(inuse = 0, calls = 0, memuse = 0K, limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes = (none)) kern.cp_time=36,0,131,28,18,11730 kern.nchstats.good_hits=18258 kern.nchstats.negative_hits=732 kern.nchstats.bad_hits=1600 kern.nchstats.false_hits=0 kern.nchstats.misses=5831 kern.nchstats.long_names=0 kern.nchstats.pass2=190 kern.nchstats.2passes=284 kern.nchstats.ncs_revhits=9 kern.nchstats.ncs_revmiss=0 kern.nchstats.ncs_dothits=0 kern.nchstats.nch_dotdothits=0 kern.forkstat.forks=325 kern.forkstat.vforks=0 kern.forkstat.tforks=9 kern.forkstat.kthreads=17 kern.forkstat.fork_pages=51946 kern.forkstat.vfork_pages=0 kern.forkstat.tfork_pages=0 kern.forkstat.kthread_pages=0 kern.nselcoll=0 kern.tty.tk_nin=782 kern.tty.tk_nout=32356 kern.tty.tk_rawcc=777 kern.tty.tk_cancc=5 kern.ccpu=1948 kern.fscale=2048 kern.nprocs=50 kern.stackgap_random=262144 kern.allowkmem=0 kern.splassert=1 kern.nfiles=131 kern.ttycount=65 kern.numvnodes=3046 kern.seminfo.semmni=10 kern.seminfo.semmns=60 kern.seminfo.semmnu=30 kern.seminfo.semmsl=60 kern.seminfo.semopm=100 kern.seminfo.semume=10 kern.seminfo.semusz=112 kern.seminfo.semvmx=32767 kern.seminfo.semaem=16384 kern.shminfo.shmmax=33554432 kern.shminfo.shmmin=1 kern.shminfo.shmmni=128 kern.shminfo.shmseg=128 kern.shminfo.shmall=8192 kern.maxclusters=262144 kern.timecounter.tick=1 kern.timecounter.timestepwarnings=0 kern.timecounter.hardware=acpitimer0 kern.timecounter.choice=i8254(0) tsc(-1000) acpitimer0(1000) dummy(-1000000) kern.maxlocksperuid=1024 kern.bufcachepercent=20 kern.wxabort=0 kern.consdev=tty00 kern.netlivelocks=2 kern.pool_debug=0 kern.global_ptrace=0 kern.audio.record=0 vm.loadavg=0.02 0.02 0.00 vm.psstrings=0x7f7ffffedf70 vm.swapencrypt.enable=1 vm.swapencrypt.keyscreated=0 vm.swapencrypt.keysdeleted=0 vm.nkmempages=32768 vm.anonmin=10 vm.vtextmin=5 vm.vnodemin=10 fs.posix.setuid=1 net.inet.ip.forwarding=0 net.inet.ip.redirect=1 net.inet.ip.ttl=64 net.inet.ip.sourceroute=0 net.inet.ip.directed-broadcast=0 net.inet.ip.portfirst=1024 net.inet.ip.portlast=49151 net.inet.ip.porthifirst=49152 net.inet.ip.porthilast=65535 net.inet.ip.maxqueue=300 net.inet.ip.encdebug=0 net.inet.ip.ipsec-expire-acquire=30 net.inet.ip.ipsec-invalid-life=60 net.inet.ip.ipsec-pfs=1 net.inet.ip.ipsec-soft-allocs=0 net.inet.ip.ipsec-allocs=0 net.inet.ip.ipsec-soft-bytes=0 net.inet.ip.ipsec-bytes=0 net.inet.ip.ipsec-timeout=86400 net.inet.ip.ipsec-soft-timeout=80000 net.inet.ip.ipsec-soft-firstuse=3600 net.inet.ip.ipsec-firstuse=7200 net.inet.ip.ipsec-enc-alg=aes net.inet.ip.ipsec-auth-alg=hmac-sha1 net.inet.ip.mtudisc=1 net.inet.ip.mtudisctimeout=600 net.inet.ip.ipsec-comp-alg=deflate net.inet.ip.ifq.len=0 net.inet.ip.ifq.maxlen=2048 net.inet.ip.ifq.drops=0 net.inet.ip.mforwarding=0 net.inet.ip.multipath=0 net.inet.ip.mrtproto=19 net.inet.ip.arpqueued=0 net.inet.ip.arptimeout=1200 net.inet.ip.arpdown=20 net.inet.icmp.maskrepl=0 net.inet.icmp.bmcastecho=0 net.inet.icmp.errppslimit=100 net.inet.icmp.rediraccept=0 net.inet.icmp.redirtimeout=600 net.inet.icmp.tstamprepl=1 net.inet.ipip.allow=0 net.inet.tcp.rfc1323=1 net.inet.tcp.keepinittime=150 net.inet.tcp.keepidle=14400 net.inet.tcp.keepintvl=150 net.inet.tcp.slowhz=2 net.inet.tcp.baddynamic=1,7,9,11,13,15,17,18,19,20,21,22,23,25,37,42,43,49,53,57,67,68,70,77,79,80,87,88,95,101,102,103,104,105,106,107,109,110,111,113,115,117,119,123,129,135,137,138,139,143,152,163,164,177,178,179,191,194,199,201,202,204,206,210,213,220,372,389,427,433,443,444,445,464,465,468,512,513,514,515,521,526,530,531,532,540,543,544,545,548,554,556,587,631,636,646,706,749,750,751,754,760,871,873,888,901,993,995,1080,1109,1127,1433,1434,1524,1525,1529,1723,1900,2049,2105,2106,2108,2110,2111,2112,2120,2121,2401,2600,2601,2602,2603,2604,2605,2606,2607,2608,2627,2983,3031,3109,3260,3306,3389,3517,3689,3690,4190,4444,4500,4559,5002,5060,5222,5269,5280,5298,5353,5354,5432,5680,5900,6000,6001,6002,6003,6004,6005,6006,6007,6008,6009,6010,6514,6566,7000,7001,7002,7003,7004,7005,7006,7007,7008,7009,7326,8025,8026,8140,8953,9418,10050,10051,16992,16993,16994,16995,20005 net.inet.tcp.sack=1 net.inet.tcp.mssdflt=512 net.inet.tcp.rstppslimit=100 net.inet.tcp.ackonpush=0 net.inet.tcp.ecn=0 net.inet.tcp.syncachelimit=10255 net.inet.tcp.synbucketlimit=105 net.inet.tcp.rfc3390=2 net.inet.tcp.reasslimit=32768 net.inet.tcp.sackholelimit=32768 net.inet.tcp.always_keepalive=0 net.inet.tcp.synuselimit=100000 net.inet.tcp.rootonly=2049 net.inet.tcp.synhashsize=293 net.inet.udp.checksum=1 net.inet.udp.baddynamic=7,9,13,18,19,22,37,39,49,53,67,68,69,70,80,88,105,107,109,110,111,123,129,135,137,138,139,143,161,162,163,164,177,178,179,191,194,199,201,202,204,206,210,213,220,372,389,427,444,445,464,468,500,512,513,514,517,518,520,525,533,546,547,548,554,587,623,631,636,646,664,706,749,750,751,993,995,1433,1434,1524,1525,1645,1646,1701,1723,1812,1813,1900,2049,2401,3031,3517,3689,3784,3785,4190,4444,4500,4559,4754,4755,4789,5002,5060,5298,5353,5354,5432,7000,7001,7002,7003,7004,7005,7006,7007,7008,7009,7784,8025,8067,9418,10050,10051,16992,16993,16994,16995,20005,26740 net.inet.udp.recvspace=41600 net.inet.udp.sendspace=9216 net.inet.udp.rootonly=2049 net.inet.gre.allow=0 net.inet.gre.wccp=0 net.inet.esp.enable=1 net.inet.esp.udpencap=1 net.inet.esp.udpencap_port=4500 net.inet.ah.enable=1 net.inet.mobileip.allow=0 net.inet.etherip.allow=0 net.inet.ipcomp.enable=0 net.inet.carp.allow=1 net.inet.carp.preempt=0 net.inet.carp.log=2 net.inet.divert.recvspace=65636 net.inet.divert.sendspace=65636 net.inet6.ip6.forwarding=0 net.inet6.ip6.redirect=1 net.inet6.ip6.hlim=64 net.inet6.ip6.mrtproto=0 net.inet6.ip6.maxfragpackets=200 net.inet6.ip6.log_interval=5 net.inet6.ip6.hdrnestlimit=10 net.inet6.ip6.dad_count=1 net.inet6.ip6.auto_flowlabel=1 net.inet6.ip6.defmcasthlim=1 net.inet6.ip6.use_deprecated=1 net.inet6.ip6.maxfrags=200 net.inet6.ip6.mforwarding=0 net.inet6.ip6.multipath=0 net.inet6.ip6.multicast_mtudisc=0 net.inet6.ip6.neighborgcthresh=2048 net.inet6.ip6.maxdynroutes=4096 net.inet6.ip6.dad_pending=0 net.inet6.ip6.mtudisctimeout=600 net.inet6.ip6.ifq.len=0 net.inet6.ip6.ifq.maxlen=2048 net.inet6.ip6.ifq.drops=0 net.inet6.ip6.soiikey=7fb7b12d4cfa145448ade20fef62b2e2 net.inet6.icmp6.redirtimeout=600 net.inet6.icmp6.nd6_delay=5 net.inet6.icmp6.nd6_umaxtries=3 net.inet6.icmp6.nd6_mmaxtries=3 net.inet6.icmp6.errppslimit=100 net.inet6.icmp6.nd6_maxnudhint=0 net.inet6.icmp6.mtudisc_hiwat=1280 net.inet6.icmp6.mtudisc_lowat=256 net.inet6.icmp6.nd6_debug=0 net.inet6.divert.recvspace=65636 net.inet6.divert.sendspace=65636 net.bpf.bufsize=32768 net.bpf.maxbufsize=2097152 net.mpls.ttl=255 net.mpls.maxloop_inkernel=16 net.mpls.mapttl_ip=1 net.mpls.mapttl_ip6=0 net.pipex.enable=0 net.pipex.inq.len=0 net.pipex.inq.maxlen=256 net.pipex.inq.drops=0 net.pipex.outq.len=0 net.pipex.outq.maxlen=256 net.pipex.outq.drops=0 hw.machine=amd64 hw.model=Intel(R) Xeon(R) CPU @ 2.50GHz hw.ncpu=4 hw.byteorder=1234 hw.pagesize=4096 hw.disknames=sd0:26b0025335694106 hw.diskcount=1 hw.cpuspeed=2500 hw.vendor=Google hw.product=Google Compute Engine hw.serialno=GoogleCloud-48FCB60BD20B13009CC205380F0B84CB hw.uuid=48fcb60b-d20b-1300-9cc2-05380f0b84cb hw.physmem=3848261632 hw.usermem=3848249344 hw.ncpufound=4 hw.allowpowerdown=1 hw.smt=1 hw.ncpuonline=4 machdep.console_device=tty00 machdep.bios.diskinfo.128=bootdev = 0xa0000204, cylinders = 1023, heads = 255, sectors = 63 machdep.bios.cksumlen=1 machdep.allowaperture=0 machdep.cpuvendor=GenuineIntel machdep.cpuid=0x306e4 machdep.cpufeature=0x1f9bfbff machdep.kbdreset=0 machdep.xcrypt=0 machdep.lidaction=1 machdep.forceukbd=0 machdep.tscfreq=2500608250 machdep.invarianttsc=1 ddb.radix=16 ddb.max_width=80 ddb.max_line=24 ddb.tab_stop_width=8 ddb.panic=1 ddb.console=0 ddb.log=1 ddb.trigger=0 vfs.mounts.ffs has 1 mounted instance vfs.ffs.max_softdeps=23704 vfs.ffs.sd_tickdelay=2 vfs.ffs.sd_worklist_push=0 vfs.ffs.sd_blk_limit_push=0 vfs.ffs.sd_ino_limit_push=0 vfs.ffs.sd_blk_limit_hit=0 vfs.ffs.sd_ino_limit_hit=0 vfs.ffs.sd_sync_limit_hit=0 vfs.ffs.sd_indir_blk_ptrs=0 vfs.ffs.sd_inode_bitmap=0 vfs.ffs.sd_direct_blk_ptrs=0 vfs.ffs.sd_dir_entry=0 vfs.ffs.dirhash_dirsize=2560 vfs.ffs.dirhash_maxmem=2097152 vfs.ffs.dirhash_mem=23943 vfs.nfs.iothreads=-1 vfs.fuse.fusefs_open_devices=0 vfs.fuse.fusefs_fbufs_in=0 vfs.fuse.fusefs_fbufs_wait=0 vfs.fuse.fusefs_pool_pages=0

mdempsky commented 5 years ago

@bradfitz Yeah, that seems very plausible to me. It looks like pvclock(4) has been committed upstream: http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/dev/pv/pvclock.c

So it might actually be worth trying a -current snapshot to see if Go builds faster.

bradfitz commented 5 years ago

@mdempsky, could you send a CL to x/build/env/openbsd-amd64/make.bash that pulls from snapshots or whatever?

I can then deploy a new image based on it.

juanfra684 commented 5 years ago

I ran the build on -current changing only the timecounter entry:

pvclock0
real       190.79
user       307.62
sys         88.26

acpihpet0
real       215.19
user       305.29
sys        150.30

mdempsky commented 5 years ago

@bradfitz Looks like @dmitshur already added a SNAPSHOT=false variable to make.bash that you just need to hack to SNAPSHOT=true.

mdempsky commented 5 years ago

@juanfra684 Thanks, that's helpful. And I infer that's inside a KVM guest (since I don't think pvclock is usable otherwise)?

It's disappointing though that it only improved real time by ~10%. It seems like there's still more slow downs to identify.

paulzhol commented 5 years ago

Here are my results on a Celeron 2955U (dual-core 1.4Ghz Haswell) running ubnutu 18.04.1 host with an openbsd 6.4 vm w KVM (my kern.timecounter.hardware=acpihpet0):

GENERIC.MP
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Celeron(R) 2955U @ 1.40GHz, 112.48 MHz, 06-45-01
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,VMX,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,XSAVE,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,FSGSBASE,ERMS,INVPCID,IBRS,IBPB,SSBD,ARAT,XSAVEOPT,MELTDOWN                   
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache                                                              
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 999MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Celeron(R) 2955U @ 1.40GHz, 176.06 MHz, 06-45-01
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,VMX,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,XSAVE,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,FSGSBASE,ERMS,INVPCID,IBRS,IBPB,SSBD,ARAT,XSAVEOPT,MELTDOWN                   
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache                                                              
cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: smt 0, core 0, package 1

Duration: 1.10mins, Total samples = 4.73s ( 7.17%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top30
Showing nodes accounting for 3.97s, 83.93% of 4.73s total
Dropped 224 nodes (cum <= 0.02s)
Showing top 30 nodes out of 187
      flat  flat%   sum%        cum   cum%
     0.80s 16.91% 16.91%      0.80s 16.91%  runtime.thrsleep
     0.73s 15.43% 32.35%      0.73s 15.43%  runtime.usleep
     0.53s 11.21% 43.55%      0.53s 11.21%  runtime.nanotime
     0.34s  7.19% 50.74%      0.34s  7.19%  runtime.kevent
     0.31s  6.55% 57.29%      0.31s  6.55%  runtime.memclrNoHeapPointers
     0.26s  5.50% 62.79%      0.29s  6.13%  syscall.Syscall
     0.18s  3.81% 66.60%      0.18s  3.81%  runtime.thrwakeup
     0.15s  3.17% 69.77%      0.18s  3.81%  runtime.scanobject
     0.09s  1.90% 71.67%      0.09s  1.90%  runtime.nextFreeFast
     0.07s  1.48% 73.15%      0.09s  1.90%  runtime.lock
     0.07s  1.48% 74.63%      0.64s 13.53%  runtime.mallocgc
     0.07s  1.48% 76.11%      0.07s  1.48%  runtime.memmove
     0.05s  1.06% 77.17%      0.89s 18.82%  runtime.sysmon
     0.04s  0.85% 78.01%      0.04s  0.85%  indexbytebody
     0.04s  0.85% 78.86%      0.05s  1.06%  runtime.scanblock
     0.03s  0.63% 79.49%      0.03s  0.63%  runtime.(*guintptr).cas
     0.03s  0.63% 80.13%      0.33s  6.98%  runtime.sweepone
     0.02s  0.42% 80.55%      0.08s  1.69%  go/parser.(*parser).next0
     0.02s  0.42% 80.97%      0.03s  0.63%  regexp.(*Regexp).tryBacktrack
     0.02s  0.42% 81.40%      0.09s  1.90%  runtime.exitsyscall
     0.02s  0.42% 81.82%      0.47s  9.94%  runtime.timerproc
     0.02s  0.42% 82.24%      0.03s  0.63%  syscall.Syscall6
     0.01s  0.21% 82.45%      0.05s  1.06%  bufio.(*Scanner).Scan
     0.01s  0.21% 82.66%      0.12s  2.54%  bytes.(*Buffer).ReadFrom
     0.01s  0.21% 82.88%      0.07s  1.48%  cmd/compile/internal/gc.TestShiftGeneric
     0.01s  0.21% 83.09%      0.07s  1.48%  debug/dwarf.(*buf).entry
     0.01s  0.21% 83.30%      0.36s  7.61%  go/parser.(*parser).parseFile
     0.01s  0.21% 83.51%      0.21s  4.44%  go/parser.(*parser).parseLiteralValue
     0.01s  0.21% 83.72%      0.05s  1.06%  go/parser.(*parser).parseParameterList
     0.01s  0.21% 83.93%      0.22s  4.65%  go/parser.(*parser).parseUnaryExpr

Then I've disabled (at least tried to disable Meltdown mitigations) specifically what is called KPTI in linux which makes syscalls more expensive because the process page table needs to be switched with one containing kernel mappings, and then switched back again.

By setting inserting jmp .Lcpu_secure and rebuilding the kernel: https://github.com/openbsd/src/blob/cd9c9a41d861820f0135e271b4bd1522d99ef3b7/sys/arch/amd64/amd64/locore0.S#L212

GENERIC.MP_NO_MELTDOWN
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Celeron(R) 2955U @ 1.40GHz, 51.83 MHz, 06-45-01
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,VMX,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,XSAVE,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,FSGSBASE,ERMS,INVPCID,IBRS,IBPB,SSBD,ARAT,XSAVEOPT
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 1000MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Celeron(R) 2955U @ 1.40GHz, 158.44 MHz, 06-45-01
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,VMX,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,XSAVE,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,FSGSBASE,ERMS,INVPCID,IBRS,IBPB,SSBD,ARAT,XSAVEOPT
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: smt 0, core 0, package 1

Duration: 1.01mins, Total samples = 3.07s ( 5.05%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top30
Showing nodes accounting for 2.41s, 78.50% of 3.07s total
Dropped 163 nodes (cum <= 0.02s)
Showing top 30 nodes out of 244
      flat  flat%   sum%        cum   cum%
     0.43s 14.01% 14.01%      0.43s 14.01%  runtime.thrsleep
     0.27s  8.79% 22.80%      0.27s  8.79%  runtime.kevent
     0.24s  7.82% 30.62%      0.24s  7.82%  runtime.usleep
     0.22s  7.17% 37.79%      0.25s  8.14%  syscall.Syscall
     0.18s  5.86% 43.65%      0.18s  5.86%  runtime.memclrNoHeapPointers
     0.17s  5.54% 49.19%      0.17s  5.54%  runtime.nanotime
     0.11s  3.58% 52.77%      0.53s 17.26%  runtime.mallocgc
     0.09s  2.93% 55.70%      0.10s  3.26%  runtime.scanobject
     0.07s  2.28% 57.98%      0.07s  2.28%  runtime.nextFreeFast
     0.06s  1.95% 59.93%      0.06s  1.95%  runtime.scanblock
     0.05s  1.63% 61.56%      0.06s  1.95%  go/scanner.(*Scanner).next
     0.05s  1.63% 63.19%      0.05s  1.63%  runtime.heapBitsSetType
     0.05s  1.63% 64.82%      0.07s  2.28%  runtime.lock
     0.05s  1.63% 66.45%      0.06s  1.95%  strings.Index
     0.04s  1.30% 67.75%      0.04s  1.30%  runtime.thrwakeup
     0.03s  0.98% 68.73%      0.19s  6.19%  cmd/compile/internal/gc.TestShiftGeneric
     0.03s  0.98% 69.71%      0.31s 10.10%  go/parser.(*parser).parseBinaryExpr
     0.03s  0.98% 70.68%      0.03s  0.98%  runtime.memmove
     0.02s  0.65% 71.34%      0.02s  0.65%  compress/flate.(*decompressor).huffSym
     0.02s  0.65% 71.99%      0.04s  1.30%  compress/flate.(*decompressor).huffmanBlock
     0.02s  0.65% 72.64%      0.16s  5.21%  go/parser.(*parser).next0
     0.02s  0.65% 73.29%      0.06s  1.95%  go/scanner.(*Scanner).scanNumber
     0.02s  0.65% 73.94%      0.02s  0.65%  memeqbody
     0.02s  0.65% 74.59%      0.12s  3.91%  reflect.Value.call
     0.02s  0.65% 75.24%      0.13s  4.23%  regexp.(*Regexp).backtrack
     0.02s  0.65% 75.90%      0.03s  0.98%  regexp.(*Regexp).tryBacktrack
     0.02s  0.65% 76.55%      0.02s  0.65%  runtime.(*gcBitsArena).tryAlloc
     0.02s  0.65% 77.20%      0.03s  0.98%  runtime.acquireSudog
     0.02s  0.65% 77.85%      0.02s  0.65%  runtime.casgstatus
     0.02s  0.65% 78.50%      0.03s  0.98%  runtime.gcWriteBarrier

Regarding kern.timecounter.hardware, I think it's safe to force the tsc one before rebuilding the image because the hardware is recent enought for the tsc to be invariant (I did this in the FreeBSD images as well).

update: My first "patch" to disable meltdown wasn't doing anything significant, changed to directly jump to the .Lcpu_secure branch and updated the results. Test ran 10s faster, nanotime call is also lower.

juanfra684 commented 5 years ago

@juanfra684 Thanks, that's helpful. And I infer that's inside a KVM guest (since I don't think pvclock is usable otherwise)?

Yes, qemu+kvm with an AMD SVM CPU.

mdempsky commented 5 years ago

@paulzhol Thanks for the test report. I'm not sure I understand what conclusion to draw from your results though.

Are you suggesting OpenBSD's meltdown mitigations might be the cause of the OpenBSD slowdown on GCE? Is there an easy way to inspect the value of cpu_meltdown? E.g., would it appear in dmesg and/or sysctl output?

paulzhol commented 5 years ago

@mdempsky sorry, my initial results were wrong, I wasn't disabling Meltdown mitigation completely (or rather was still using the split kernel-userspace page tables). I've updated with the proper patch and actual results.

in dmesg it is shown in the cpu flags: cpu0: FPU,VME ... ARAT,XSAVEOPT,MELTDOWN

You can see from my attached log, the GENERIC.MP_NO_MELTDOWN kernel doesn't report them, while the GENERIC.MP one (and @bradfitz's) does.

paulzhol commented 5 years ago

@mdempsky KPTI effects all syscall heavy workloads,not OpenBSD specifically. My assumption is that the OpenBSD developrs are more strict and maybe have enabled additional measures (L1TF mitigations on the host will effect VMs significantly, and I'm not sure if there's something that can be turned off in the guest).

It would be interesting to see if you can get a more significant speedup on GCE with tsc as timecounter and Meltdown disabled.

gopherbot commented 5 years ago

Change https://golang.org/cl/160317 mentions this issue: cmd/debugnewvm: add --just-make flag, more timing info in log

gopherbot commented 5 years ago

Change https://golang.org/cl/160318 mentions this issue: dashboard: add openbsd-amd64-current

bradfitz commented 5 years ago

Hooray, pvclock seemed to do the trick! (or something in -current)

Before: openbsd-amd64-64: make.bash in 5m9s

After: openbsd-amd64-current: make.bash in 2m23s (I see "pvclock0 at pvbus0" during boot).

bradfitz commented 5 years ago

Two takeaways here:

Go really likes to check the time
Getting the time was really slow on a VM in OpenBSD 6.4 on GCE (under KVM)

bradfitz commented 5 years ago

Hey @reyk, I thought you'd like to see this. Thanks for the pvclock driver--- it halves the time needed to build Go on GCE!

paulzhol commented 5 years ago

@bradfitz could you try adding kern.timecounter.hardware=tsc here for the regular 6.4 image?

golang / go

x/build: openbsd-amd64-64 trybots are too slow #29223