golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.31k stars 17.58k forks source link

x/build: find new cloud provider for Solaris, Illumos builders? #15581

Open bradfitz opened 8 years ago

bradfitz commented 8 years ago

Now that we have SmartOS builders on Joyent in a custom image using the buildlet, let's use the Joyent API and dynamically create the containers as needed. This should be done by creating a new pool type in x/build/cmd/coordinator, similar to the GCE, reverse, and Kubernetes pool types.

This will both be cheaper (run zero when we need zero), but also let us scale from 0 to dozens as needed and let us do sharded builds and let SmartOS be a trybot. (currently we just run 2 containers all the time)

I see lots of joyent stuff at https://godoc.org/?q=joyent

/cc @davecheney @4ad

bradfitz commented 8 years ago

Old bug: #9515

bradfitz commented 7 years ago

I was just about to file this dup bug, forgetting I'd already filed it, so I'll copy the text I was about to post:


Currently the Joyent builders (GOOS=solaris, but really GOOS=illumos once #20603 happens) are statically created and the instances sit there idle most of the time, long polling the build coordinator for work. And when there is a burst of work, we can't process a burst, because we only have N instances.

That is, they use the buildlet's "reverse" mode, where the buildlets connect to farmer.golang.org and register themselves, rather than being dynamically created.

We currently have three implementations of the coordinator's BuildletPool interface,

It's kinda a waste that we're paying for N static Joyent instances just to run in reverse mode, since Joyent can already quickly spin up containers.

We should implement a JoyentBuildletPool implementations of BuildletPool and implement the Joyent API.

Of course, if we could run illumos or OmniOS on GCE that would be more ideal from a less-code-to-write angle, but I don't think they run there yet.

I do see references to EC2 AMIs for illumos and OmniOS, so maybe writing an EC2BuidlletPool implementation of the BuildletPool interface is a better use of our time and could be used for other OSes that don't run on GCE's KVM.

In any case, the static reverse builder situation is not ideal.

/cc @adams-sarah @cybrcodr

4ad commented 7 years ago

I do see references to EC2 AMIs for illumos and OmniOS

The future of OmniOS is uncertain: https://lists.omniti.com/pipermail/omnios-discuss/2017-April/008699.html

bradfitz commented 5 years ago

I just ran OmniOS-CE (the community edition) at home (omniosce-r151026u.iso, 7th of May, 2018) on KVM/QEMU and it works fine and passes all.bash.

It supports running under virtio-net but not virtio-scsi (that driver exists somewhere for ilumos, but it's not merged? or not in omnios-ce?). It does, however, support virtio-blk. But GCE doesn't support virtio-blk.

So we can't run OmniOS directly on GCE.

But because GCE now supports nested virtualization, we could do something slightly gross or lovely:

I think that's our best bet for Solaris scalable, trybots at this point. It's slightly tedious, but it stays within the GCP ecosystem we're already mostly using and where we have tons of quota, and the network is super fast, not leaving a building.

/cc @dmitshur

gopherbot commented 5 years ago

Change https://golang.org/cl/162959 mentions this issue: dashboard, buildlet: add a disabled builder with nested virt, for testing

gopherbot commented 5 years ago

Change https://golang.org/cl/163057 mentions this issue: buildlet: change image name for COS-with-vmx buildlet

gopherbot commented 5 years ago

Change https://golang.org/cl/163301 mentions this issue: env/linux-x86-vmx: add new Debian host that's like Container-Optimized OS + vmx

andybons commented 5 years ago

Joyent Public Cloud is closing down November 9, 2019.

gopherbot commented 4 years ago

Change https://golang.org/cl/200219 mentions this issue: dashboard, cmd/coordinator: remove Joyent builders

bcmills commented 4 years ago

We appear to no longer have any Illumos builders. Should we file an issue to remove/deprecate the port in 1.14?

bradfitz commented 4 years ago

@jclulow, does it run on GCE yet? (virtio-scsi was WIP last I heard?)

jclulow commented 4 years ago

The Virtio SCSI support is still a WIP, but I'm circling back around to look at it. The other critical issue we had with GCE was this bug in the GCE hypervisor itself -- but I received notification that it's been fixed in the last week, so I'm going to try it out!

In the interim, I've seen people asking for some kind of key for a builder on the mailing list. If I can provide a zone similar to the one that was provided by Joyent, is that something I can get configured as a stop gap for this week?

bradfitz commented 4 years ago

@jclulow, it's a key but also configuration on our side. See the CL in github/golang/build recently where I removed illumos and send one to add it back, modified. Then I'll send you a key.

jclulow commented 4 years ago

@jclulow, it's a key but also configuration on our side. See the CL in github/golang/build recently where I removed illumos and send one to add it back, modified. Then I'll send you a key.

Do you mean this one?

https://github.com/golang/build/commit/b61ecd0449282303da5f93eb29a6638ffb8a20e1

https://go-review.googlesource.com/c/build/+/200219

I'll have a look!

bradfitz commented 4 years ago

Yup.

gopherbot commented 4 years ago

Change https://golang.org/cl/201597 mentions this issue: dashboard: add interim illumos builder

jclulow commented 4 years ago

On a Linux machine, I ran:

GOOS=illumos GOARCH=amd64 BOOTSTRAP_FORMAT=mintgz ./bootstrap.bash

I've made this available inside the zone:

[root@gobuild1 ~]# /opt/go/bootstrap/bin/go version
go version devel +dad616375f Wed Oct 16 18:27:16 2019 +0000 illumos/amd64

I also built a stage0 binary from cmd/buildlet/stage0 in the build repo, and I've run that under SMF in the zone with this environment:

"HOME": "/home/gobuild",
"GOROOT_BOOTSTRAP": "/opt/go/bootstrap",
"USER": "gobuild",
"LOGNAME": "gobuild",
"PATH": "/usr/bin:/usr/sbin:/sbin:/opt/local/bin:/opt/local/sbin:/opt/go/bootstrap/bin",
"TMPDIR": "/var/tmp",
"LANG": "en_US.UTF-8",

I was able to use curl to get the buildlet to unpack a tar of the Go source and build it in the work directory. Once I add GO_BUILDER_ENV=host-illumos-amd64-jclulow to the enviroment, the buildlet then wants the key:

stage0: 2019/10/17 00:53:34 bootstrap binary running
stage0: 2019/10/17 00:53:34 waiting for network.
stage0: 2019/10/17 00:53:34 network up after 300ms
stage0: 2019/10/17 00:53:34 downloading https://storage.googleapis.com/go-builder-data/buildlet.illumos-amd64 to ./buildlet.exe ...
stage0: 2019/10/17 00:53:34 downloaded ./buildlet.exe (14194957 bytes)
stage0: 2019/10/17 00:53:34 downloaded buildlet in 100ms
2019/10/17 00:53:34 buildlet starting.
2019/10/17 00:53:34 failed to find key for host-illumos-amd64-jclulow: cannot read key file "/home/gobuild/.gobuildkey-host-illumos-amd64-jclulow": open /home/gobuild/.gobuildkey-host-illumos-amd64-jclulow: no such file or directory
stage0: 2019/10/17 00:53:34 Error running buildlet: exit status 1
...

So I think this is all good to go, with the addition to the dashboard in the CL? I didn't put in a health check entry because it seems like that's just for infrastructure that's currently managed by the Go team.

Please let me know what to do next!

gopherbot commented 4 years ago

Change https://golang.org/cl/201740 mentions this issue: doc/go1.14.html: add some TODOs about various ports

nwilkens commented 4 years ago

I'd be happy to sponsor build hosts at https://mnx.io to offset the JPC EOL issues. Feel free to discuss needs here, or directly via email nick @ mnx io.

rorth commented 4 years ago

I just noticed the Solaris comment here, which is mostly wrong: while Shawn Walker has left Oracle, the builder never ran at an Oracle site, but on on a system maintained by me.

Oracle Solaris certainly still is maintained (e.g. I'm running current betas), and I've sort of taken over maintaining the builder.

My primary interest is to get early warning when upstream golang changes break the Solaris support, but I'm the GCC Solaris maintainer with an interest in keeping gccgo working. Ian has access to several Solaris systems at our site to investigate issues if necessary.

bradfitz commented 4 years ago

@rorth, great! I'll update our notes.

Can you provide any more info about the machine/VM specs and its OS version?

gopherbot commented 4 years ago

Change https://golang.org/cl/205600 mentions this issue: dashboard: update Solaris owner

rorth commented 4 years ago

Brad Fitzpatrick notifications@github.com writes:

@rorth, great! I'll update our notes.

Can you provide any more info about the machine/VM specs and its OS version?

Sure: right now, the builder is running in a Solaris kernel zone (effectively a VM) hosted on a Sun Fire X4440. It's running Solaris 11.4 (usually updated to the latest monthly patch collection/SRU), with the intent of always upgrading to the latest Solaris release (i.e. no betas). The kernel zone has been assigned 24 cores of AMD Opteron 8435 CPUs, 32 GB RAM, and 128 GB disk space.

dmitshur commented 4 years ago

@rorth One of the outstanding TODOs in the Go 1.14 release notes is:

TODO: announce something about the Go Solaris port? Solaris itself is unmaintained? The builder is still running at Oracle, but the employee who set it up left the company and we have no way to maintain it.

From https://build.golang.org/, I see the solaris-amd64-oraclerel builder is passing for main Go repository (on tip, release-branch.go1.13 and release-branch.go1.12) and golang.org/x repos (also on tip, release-branch.go1.13 and release-branch.go1.12).

Based on your https://github.com/golang/go/issues/15581#issuecomment-550368581 above, it sounds to me that we can resolve that TODO by not saying anything about Solaris in the Go 1.14 release notes. Does that sound right to you, or do you think we should say something about Solaris itself not being maintained (I'm not very familiar with its state)? Would you mind sending a CL to doc/go1.14.html to address that TODO? Thank you.

Edit: I've sent CL 217738.

/cc @toothrot @cagedmantis @golang/osp-team

dmitshur commented 4 years ago

@jclulow Thanks for adding the interim illumos-amd64 builder in CL 201597. An outstanding TODO in the Go 1.14 release notes is:

TODO: is Illumos up with a builder and passing? https://golang.org/issue/15581.

I've checked, and it is up and passing on Go tip and release-branch.go1.13 (with one failure that appears to be flaky due to being out of memory). It's also passing on all golang.org/x repos on tip and release-branch.go1.13.

Would you like to send a CL to update the release notes, resolving that TODO?

Edit: I've sent CL 217737.

/cc @cagedmantis @toothrot @golang/osp-team

gopherbot commented 4 years ago

Change https://golang.org/cl/217737 mentions this issue: doc/go1.14: remove TODO about Illumos port

gopherbot commented 4 years ago

Change https://golang.org/cl/217738 mentions this issue: doc/go1.14: remove TODO about Solaris port

dmitshur commented 4 years ago

@rorth I've sent CL 217738 that implements what I described in https://github.com/golang/go/issues/15581#issuecomment-581534916. Please take a look if you can. Also, is there an email we can use to reach you (either directly, or for Gerrit code reviews)? Thank you.

dmitshur commented 4 years ago

@jclulow I've sent CL 217737 for https://github.com/golang/go/issues/15581#issuecomment-581690659 and added you as a reviewer.

rorth commented 4 years ago

Dmitri Shuralyov notifications@github.com writes:

@rorth One of the outstanding TODOs in the Go 1.14 release notes is:

TODO: announce something about the Go Solaris port? Solaris itself is unmaintained? The builder is still running at Oracle, but the employee who set it up left the company and we have no way to maintain it.

From https://build.golang.org/, I see the solaris-amd64-oraclerel builder is passing for main Go repository (on tip, release-branch.go1.13 and release-branch.go1.12) and golang.org/x repos (also on tip, release-branch.go1.13 and release-branch.go1.12).

Based on your #15581 (comment) above, it sounds to me that we can resolve that TODO by not saying anything about Solaris in the Go 1.14 release notes. Does that sound right to you, or do you think we should say something about Solaris itself not

Seems right to me: Go is in decent shape thanks mostly to @iant's work and my testing (both builder and gcc builds) and there's nothing special users need to know.

being maintained (I'm not very familiar with its state)? Would you mind sending a CL to doc/go1.14.html to address that TODO? Thank you.

As I'd mentioned, Solaris is both supported with regular updates (SRUs) coming every month and developed (I'm running the biweekly builds of Solaris 11.5 Beta on several of my systems including my desktops), so there's nothing to say here.

Rainer

--

Rainer Orth, Center for Biotechnology, Bielefeld University

rorth commented 4 years ago

Dmitri Shuralyov notifications@github.com writes:

@rorth I've sent CL 217738 that implements what I described in #15581 (comment). Please take a look if you can. Also, is there an email we can use to reach you (either directly, or for Gerrit code reviews)? Thank you.

LGTM. You can reach me at either ro@gcc.gnu.org or ro@CeBiTec.Uni-Bielefeld.DE.

Rainer

--

Rainer Orth, Center for Biotechnology, Bielefeld University

nshalman commented 4 years ago

Is this still unresolved? I'm a happy customer of mnx.io so if their offer still stands and hasn't been taken up, it should be. I'm happy to provide assistance as well.

rorth commented 4 years ago

Nahum Shalman notifications@github.com writes:

Is this still unresolved? I'm a happy customer of mnx.io so if their offer still stands and hasn't been taken up, it should be. I'm happy to provide assistance as well.

I can only speak for the Solaris side, of course, but that builder is up and running at my site; no need for a new provider to host it AFAICS.

Toasterson commented 4 years ago

Is this still unresolved? I'm a happy customer of mnx.io so if their offer still stands and hasn't been taken up, it should be. I'm happy to provide assistance as well.

the Illumos builder has been running quite a while thanks to @jclulow

We still wanted GCE support for better runners but thats a driver making question IIRC. The idea was to keep this issue until that is done but people are happy with the current situation so it may be closed.