golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.24k stars 17.7k forks source link

x/build: add LUCI netbsd-arm builder #63698

Open bsiegert opened 1 year ago

bsiegert commented 1 year ago

Hostname: netbsd-arm-bsiegert

netbsd-arm-bsiegert.csr.txt

/cc @golang/release

bsiegert commented 1 year ago

Any updates? It's been a month.

dmitshur commented 1 year ago

Thanks for pinging; it looks like we missed this new-builder issue, sorry.

We'll pick this up at the start of next week since this week is short due to US holidays.

gopherbot commented 12 months ago

Change https://go.dev/cl/545536 mentions this issue: main.star: add netbsd-arm, netbsd-arm64, openbsd-riscv64 builders

dmitshur commented 11 months ago

Here's the resulting certificate: netbsd-arm-bsiegert-1701367164.cert.txt.

The builder definitions have been added in CL 545536 so your bot should be able to connect once you follow the rest of the steps on your end.

We have some more work to do to make the dependencies built for the netbsd/arm port and available in CIPD, which will be needed for the builds to complete successfully. We'll update this issue once that's done.

dmitshur commented 11 months ago

more work to do to make the dependencies built for the netbsd/arm port and available in CIPD

I mailed crrev.com/c/5086069 for this.

dmitshur commented 11 months ago

That CL is submitted and the dependencies are built.

If you give it a shot to connect with the builder, we can see what the next steps are for this.

bsiegert commented 11 months ago

Thanks Dmitri! The bot is now up and running at https://chromium-swarm.appspot.com/bot?id=netbsd-arm-bsiegert. It shows up as

bsiegert commented 11 months ago

Sorry, hit Submit too fast.

It shows up as

cipd_platform=netbsd-armv6l
cpu=evbarm | evbarm-32
dmitshur commented 10 months ago

I'm not seeing successful builds in https://ci.chromium.org/ui/p/golang/builders/ci/gotip-netbsd-arm?limit=200, and the builder isn't showing up as connected now. Can you take a look at what its current status is? I'll reopen this issue so we can track what's still left to do here.

gopherbot commented 10 months ago

Change https://go.dev/cl/558517 mentions this issue: main.star: fix cipd_platform value for GOHOSTARCH=arm

dmitshur commented 10 months ago

CL 558517 fixed the cipd_platform value in the builder definition, and triggered some work, e.g., https://chromium-swarm.appspot.com/task?id=675f6b5f66994610. It's failing with an internal failure:

swarming_bot_logs: 2024-01-26 16:28:59.383: Starting run_isolated script
swarming_bot_logs: 2024-01-26 16:28:59.537: Trimming caches. min_ts: 1704472139, free_disk: 35253248000, min_free_space: 62578626346
swarming_bot_logs: 2024-01-26 16:28:59.542: trimming cache with dir /home/swarming/.swarming/cas_cache
swarming_bot_logs: 2024-01-26 16:28:59.546: trimming cache with dir /home/swarming/.swarming/c
swarming_bot_logs: 2024-01-26 16:28:59.549: trim_caches: took 0 seconds
swarming_bot_logs: 2024-01-26 16:29:04.267: Installed CIPD client
10397 2024-01-26 16:29:05.190 E: internal failure: Expecting value: line 1 column 1 (char 0)
Traceback (most recent call last):
  File "/home/swarming/.swarming/swarming_bot.2.zip/client/run_isolated.py", line 858, in map_and_run
    with data.install_packages_fn(run_dir, cas_client_dir) as cipd_info:
  File "/usr/pkg/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/home/swarming/.swarming/swarming_bot.2.zip/client/run_isolated.py", line 1199, in install_client_and_packages
    package_pins = _install_packages(run_dir, cipd_cache_dir, client,
  File "/home/swarming/.swarming/swarming_bot.2.zip/client/run_isolated.py", line 1124, in _install_packages
    pins = client.ensure(
  File "/home/swarming/.swarming/swarming_bot.2.zip/client/cipd.py", line 245, in ensure
    result_json = json.load(jfile)
  File "/usr/pkg/lib/python3.10/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/pkg/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/pkg/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/pkg/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Expecting value: line 1 column 1 (char 0)

We need to figure out what's causing that and fix it to make progress here.

Edit: I think https://source.chromium.org/chromium/infra/infra/+/main:luci/client/cipd.py;l=245 is the relevant line. The check for exit_code happens slightly later, on line 260, so it's possible something went wrong during the invocation of cipd ensure, we just don't see what it was in the log above. If you can find a way to reproduce it locally and share its output, that'd be helpful.

dmitshur commented 9 months ago

In addition to trying to reproduce this by running cipd ensure manually on the builder, you might be able to check if there's more information in /home/swarming/.swarming/logs/task_runner.log (or run_isolated.log).

bsiegert commented 9 months ago

Thanks for the pointers, I will take a look and report back.

bsiegert commented 9 months ago

This from run_isolated.log looks related:

22683 2024-01-26 16:03:45.157 U: Installed CIPD client
22683 2024-01-26 16:03:45.159 I: Installing packages {'': [('infra/tools/luci/bbagent/${platform}', 'git_revision:1f801c4894a7ced859ae672642feeeb8960da330')]} into /home/swarming/.swarming/w/ir
22683 2024-01-26 16:03:45.283 D: Running ['/home/swarming/.swarming/cipd_cache/bin/cipd', 'ensure', '-root', '/home/swarming/.swarming/w/ir', '-ensure-file', '/tmp/cipd-ensure-file-y5cmbnt9.txt', '-verbose', '-json-output', '/tmp/cipd-ensure-result-0pjif55u.json', '-cache-dir', '/home/swarming/.swarming/cipd_cache/cache', '-service-url', 'https://chrome-infra-packages.appspot.com/']
22683 2024-01-26 16:03:45.769 D: cipd client: runtime: this system has multiple CPUs and must use
22683 2024-01-26 16:03:45.806 D: cipd client: atomic synchronization instructions. Recompile using GOARM=7.
22683 2024-01-26 16:03:45.906 E: internal failure: Expecting value: line 1 column 1 (char 0)

So the cipd binary needs to be recompiled with GOARM=7 set.

dmitshur commented 9 months ago

Downloading the Go binary from here and ranning go version -m on it prints:

    build   CGO_ENABLED=0
    build   GOARCH=arm
    build   GOOS=netbsd
    build   GOARM=6

So it is built with GOARM=6 now (even though cross-compilation default for GOARM is 7 as of Go 1.21.).

Searching finds entries like this, this, and this that all suggest making a change isn't quite straightforward, because the "v6l" suffix of the "netbsd-armv6l" CIPD platform dimension corresponds to GOARM=6.

Maybe it's possible to make it work with GOARM=6 anyway, ~through changes to the atomic operations, if it turns out there's not much to do? For example, I recently did crrev.com/c/5268803 which was enough to resolve the problem for linux/arm (with GOARM=6). But if it's much more invasive, that path might be harder~. Edit: I see this is coming from the Go runtime, i.e., here and seems you'd need not to have multiple CPUs to work around it.

If the builder for this port cannot work with GOARM=6 binaries and really needs GOARM=7, we can see how involved that might be.

bsiegert commented 9 months ago

I wonder why the Chromium infra thinks the architecture is "armv6l". That's a different sub-architecture. On this machine, uname -p prints earmv7hf, not the older earmv6hf. We do not have a ARMv6 builder at the moment, these are kind of old and crufty in general.

FWIW, http://wiki.netbsd.org/ports/evbarm/ shows the different sub-architectures.

This comment says:

  For example, on ARMv7 machines we claim that we are in fact running ARMv6
  (which is subset of ARMv7), since we don't really care about v7 over v6
  difference and want to reduce the variability in supported architectures
  instead.

Which is clearly a wrong assumption.

gopherbot commented 3 weeks ago

Change https://go.dev/cl/623015 mentions this issue: main.star: use v7l suffix for netbsd-arm builder

dmitshur commented 3 weeks ago

https://ci.chromium.org/b/8732718088961275521 was the first successful build - congrats on reaching this milestone!

The build took over 2 hrs to complete. Given netbsd-arm64 takes around 25 min and still has its SLOW_HOSTS factor set to 2 (main.star#L343), netbsd-arm likely needs to also set something higher than 1. Perhaps 5 to mirror openbsd-arm, as a starting point? (CL 624075 does this.)

gopherbot commented 3 weeks ago

Change https://go.dev/cl/624075 mentions this issue: main.star: add netbsd-arm to SLOW_HOSTS

bsiegert commented 3 weeks ago

The builder is chugging along fine.

One issue though: on build.golang.org, it looks like the LUCI builds are not shown at all.

gopherbot commented 2 weeks ago

Change https://go.dev/cl/624875 mentions this issue: dashboard: set netbsd-arm{,64} builders to 0 expected

gopherbot commented 2 weeks ago

Change https://go.dev/cl/624995 mentions this issue: luci-config: remove known issue for netbsd-arm