golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.87k stars 17.52k forks source link

build: add GOOS=illumos #20603

Closed sean- closed 5 years ago

sean- commented 7 years ago

Disambiguating solaris vs illumos

SunOS/Solaris has a storied and complicated history. In [2010] Illumos forked from OpenSolaris and has continued its life in the open (, and now spent the majority of its life, as Illumos (not Solaris).

The solaris build tag is "mostly" compatible with Illumos and Illumos-based distributions (e.g. SmartOS, Nexenta, Open Indiana, Delphix, etc), however Illumos has diverged significantly from Solaris. In order to detect and support Illumos-native functionality, I propose:

  1. illumos be added as a new GOOS build tag
  2. The illumos build tag be distinct from solaris.

We considered extending the life of the solaris build target to include the illumos target for the period of one release but decided against this because it would taint community code with:

// +build !solaris
// +build illumos

that would need to be cleaned up at the end of the transition period. Backwards compatibility for the sake of backwards compatibility isn't something we're interested in maintaining.


Semi-related: it would be nice if there was a way of specifying and targeting distributions at build time. cgo on Linux and alpine vs glibc comes to mind as another area that would benefit from a distribution-specific build target.

bradfitz commented 7 years ago

/cc @4ad @binarycrusader @jtsylve for opinions too.

ianlancetaylor commented 7 years ago

Can you outline the differences relevant to the Go standard library between solaris and illumos?

binarycrusader commented 7 years ago

Illumos didn't fork in 2007, which would be especially hard since OpenSolaris didn't have it's first release until 2008. I suspect the submitter meant 2010.

However, many of the other points made are true -- for reasons that are beyond the control of engineering, the OpenSolaris program was discontinued in 2010. There are roughly seven years of divergence at this point. Solaris has changed significantly in that time for any new interfaces that were added or updated after the initial fork, etc.

Both Solaris and OpenSolaris-based derivatives (such as Illumos) added or updated existing interfaces after the fork; these interfaces, even though they have the same names, may not be compatible as they sometimes use different constant values in system headers, accept a different number of parameters, or one version accepts values the others does not.

With that said, any interface that existed before the fork will generally be compatible because Solaris has strong guarantees for backwards compatibility at the binary level (although this is not strictly guaranteed at the source level). The only catch would be new parameter values that functions may not have supported in an older release.

For the vast majority of the Go standard library, there are no appreciable differences -- they can continue to be treated as equivalent platforms. This is part of why I have not yet approached the Go development team about trying to resolve this. However, there are some important differences, most of which affect packages such as golang.org/x/sys/unix or anything that uses native system interfaces.

I've mainly been waiting for the right time to approach the golang-dev mailing list about how we should account for the differences since they are starting to matter. For example, if you run the mkall.sh script in syscall, you'll find a number of differences: https://gist.github.com/binarycrusader/1c57088d65c8f4071853af5efa37271e

Looking over those differences, you'll see what I said earlier matches up -- some of the error constants are different, Solaris has many that Illumos does not, and Illumos has a few options Solaris intentionally doesn't support (such as MAP_32BIT for mmap()), etc. The next release of Solaris also supports the xpg7 standard, which Illumos doesn't support yet as far as I know, so there are some new interfaces there as well.

More important differences in constants are things such as TCP_KEEPCNT, etc. that both Illumos and Solaris support but have different values for.

The small remaining differences are things such as different text for particular errnos, or the Bits in FdSet being uint64 instead of int64 on Solaris (bug fix) and don't typically matter in practice.

Now as for the pull request that was referenced, yes, there are big differences there. Solaris does actually have all of those statistics available, but using a completely different set of interfaces. Although both Solaris and OpenSolaris-based derivatives have the kstat general interface, the set of stats available is different. Additionally, the next release of Solaris has a completely new statistics subsystem, so being able to differentiate between Solaris and OpenSolaris-based derivatives is likely important.

There are other considerations as well when adding new interfaces to packages such as golang.org/x/sys/unix; by a quick and hacky estimation, Solaris libc has easily hundreds of additional private/public interfaces not present in OpenSolaris-based derivatives and OpenSolaris-based derivatives have a few that Solaris does not as well. Generally speaking, those differences won't affect the Go standard library, the primary points of contention are around networking-related interfaces, and some memory-related interfaces (such as mmap). There are other differences as well when it comes to the system linker (/usr/bin/ld) and what it supports, etc. but I don't think those are particularly relevant (yet).

In short, I've only recently become concerned that this is a problem worth solving somehow, and the amount of divergence that affects Go itself is fairly small.

For the record, Go is important to Solaris, and we've tried hard to remain compatible with OpenSolaris-based derivatives. I've spent roughly two years now working on porting Go to Solaris and on a port of Go to sparcv9 with Aram which should be ready to integrate in the Go 1.10 timeframe. As such, we're very interested to workout an acceptable solution with all parties involved.

sean- commented 7 years ago

@ianlancetaylor There are a number of new syscalls that only exist in Illumos. A quick of changes includes:

[/me deletes most of his reply, thank you @binarycrusader for that excellent reply - my history with Go on Illumos is only a handful of months. I used 2007 as the date based on the head of a series of reflog entries and my memory of the acquisition, but 2010 is the correct date. ]

Internally we were concerned about maintaining portability but know these are going to continue to diverge, but given our use of Go, it's clear that this will be problematic going forward as additional syscalls show up in x/sys/unix. I'm going to hand-wave past any already present divergence that I suspect is being carpeted over by the fact that we rebuild Go internally and there isn't much crossover between binaries created on Illumos and run on Solaris (though I have heard several anecdotal reports of some binaries not working but not verified this myself). I'm sure some of the kstats referenced work, though others don't (yet?/ever?).

I think there is a lot of history and shared work that could be leveraged going forward, too, but fundamentally the two OSes have diverged to the point that there is a need for a discrete build tag in order to make appropriate decisions at compile time.

binarycrusader commented 7 years ago
  • getrandom(2)

Solaris 11.3 added this: https://docs.oracle.com/cd/E86824_01/html/E54765/getrandom-2.html

As for the other functions, they are not yet available in Solaris (intentionally or because they haven't been implemented yet). But yes, those are good examples.

4ad commented 7 years ago

Having a way to distinguish illumos and Oracle Solaris targets based on build tags seems fine. I don't think that having incompatible solaris and illumos build tags is a good idea. It breaks a lot of existing code for no benefit.

Let's add two new build tags, solaris11x and illumos, and keep the solaris build tag around. By default, the solaris build tag will match both illumos and Oracle Solaris targets, which is what you want in 99% of cases anyway, and it doesn't break any existing code.

When one wants specific support for a particular Solaris variant, either one of solaris11x or illumos build tags is to be used.

(Don't get bogged on the specific names I chose for this example, the names don't matter, we can use other names like oraclesolaris and illumos, or whatever).

sean- commented 7 years ago

At the end of the day something needs to change as it is disingenuousness to users if we keep with the status quo and expect a GOOS="solaris" binary to run on either Oracle Solaris or Illumos.

If Oracle Solaris is moving to its own build tag (i.e. oraclesolaris), then sharing the solaris build tag with illumos seems fine. I think this is fine, but don'g want to obligate the Oracle community to this.

If Oracle Solaris is not going to use a new build tag, then I would object to having the solaris build tag be inclusive of illumos.

That said, we're decidedly :+1: to having Oracle Solaris get its own, new, and dedicated build tag to resolve any ambiguity going forward.

binarycrusader commented 7 years ago

I dislike bike-shedding, but for the record, I would discourage any build tag containing something that implies a "version" of some sort such as "11x", etc. While that may seem tempting, the public version is used for marketing purposes and isn't reliable as a way to discern differences in technical interfaces.

In particular, Solaris often "backports" functionality to the previous release while the next release is in development. As an example, Solaris 11.3 sometimes receives new functionality from what Solaris calls "SRUs" (the monthly updates that contain security fixes and other improvements) that are from the current in-development release.

As such, on Solaris, feature-test based builds are ideal. Unfortunately, Go's architecture (as far as I can tell) assumes runtime-based feature tests instead of build-time feature tests (unlike rust's build.rs) so that makes this difficult.

In an ideal world, I feel like the original build tag for solaris probably should have been "sunos", with the Sun/Oracle specific variant being "solaris" and "opensolaris" for the community derivatives. This makes sense too because "Solaris 11.3" is technically the marketing name and marketing version for the current release of SunOS 5.11.

However, since we don't live in an ideal world, I'd suggest "sunos" and "illumos" as the two new tags going forward; they're version-agnostic and reasonably accurate.

jtsylve commented 7 years ago

I'd suggest "sunos" and "illumos" as the two new tags going forward; they're version-agnostic and reasonably accurate.

This seems like a reasonable suggestion to me for all of the reason that @binarycrusader mentioned in his last comment.

jen20 commented 7 years ago

I am very much opposed to a system which tries to maintain compatibility between Oracle Solaris and Illumos distributions in a manner which would differ from every other GOOS. In my opinion we should keep the existing solaris build tag as referring to Oracle Solaris, and introduce illumos.

This reflects the way that the various BSDs operate, and therefore imposes no additional cognitive overhead on people who don't care about operating systems other than Linux and Darwin (which is, let's face it, most people!)

The short-term pain which will be experienced by the Illumos community to go through and add build tags to things outside the standard library is not to be minimised - but on the other hand nor is the cognitive overhead of a three-tier GOOS system on the rest of the world. I think we in the Illumos community should just deal with this short-term problem.

sean- commented 7 years ago

Given there appears to be no debate surrounding the illumos build tag, can we move forward with that?

@binarycrusader , if there is a desire to have Oracle Solaris be tagged with its own build tag, can that be submitted and addressed independently?

@bradfitz , short of a PR, what would you like the next steps to be?

binarycrusader commented 7 years ago

@sean- I'm not requesting a separate build tag for Solaris, I had only suggested it as a way to ease the transition for OpenSolaris-based derivatives. My current employer has sponsored much of the work done on Solaris at this point for Go, and I've pushed hard to maintain compatibility with OpenSolaris-based derivatives.

As such, I leave the decision on build tags up to the Go maintainers.

bradfitz commented 7 years ago

As such, I leave the decision on build tags up to the Go maintainers.

We (the Go maintainers) are not active Solaris or Illumos users. Ideally we'd prefer if the Solaris & Illumos would agree on a solution that solves the problems at hand. Maybe that's a new GOOS value. (A full fork, e.g. GOOS=freebsd vs GOOS=dragonfly) Maybe that's a build tag only for now.

Is it a goal (or non-goal) for binaries built for Solaris to run on Illumos, or vice versa?

binarycrusader commented 7 years ago

Solaris only guarantees binary compatibility for binaries built on an older version of the operating system so that it can run on a newer version. Because the standard libraries (libc) and the linkers are now significantly different, it's highly unlikely to work. With that in mind, I would assert that it is not a goal for binaries built for Solaris to run on OpenSolaris-based derivatives such as Illumos or vice/versa.

Both share a common set of system interfaces (many decades worth), but we're also nearing one decade of divergence. It's unclear to me what the intent is with Go's build tags vs. GOOS, so I can't say which should be the answer.

jen20 commented 7 years ago

@bradfitz I would treat it as a non goal for binaries built with (say) GOOS=solaris to work on Illumos and vice versa, though if they do it would be a happy accident.

IMO things would be best served moving forwards by treating Solaris and Illumos in the same manner as FreeBSD vs DragonflyBSD. Assuming @binarycrusader et al have no problem with that, it seems to be the path of least confusion, though admittedly with some short term pain.

sean- commented 7 years ago

@bradfitz Are you good with that agreement? It sounds like we have a consensus that everyone is happy with.

bradfitz commented 7 years ago

We'd like to do Go 1.9beta1 next week, and I don't think we can really do this before Go 1.10.

If we do GOOS=illumos, we'd need a GOOS=solaris builder first (#15072, which has been open for some time).

jen20 commented 7 years ago

@bradfitz I'm not sure on how a Solaris builder can be supplied beyond VirtualBox (perhaps @binarycrusader can help there), but I'm sure we at Joyent can run any necessary Illumos builders without issue. (cc @bcantrill).

bradfitz commented 7 years ago

We currently run Illumos builders on Joyent already. (but would love help improving our setup, if somebody has some time) It's the Solaris ones where we lack coverage.

binarycrusader commented 7 years ago

Solaris amd64 currently only supports Xen-based virtualization, Solaris-based virtualization (kernel zones), full virtualization (VirtualBox/VMWare), or bare metal provisioning. It does not have the virtio/virtnet drivers required by GCE (?).

bradfitz commented 7 years ago

We have 10 VMWare nodes. But I'd rather discuss on #15072.

rsc commented 7 years ago

Based on discussion above, switching to GOOS=illumos for Go 1.10 is fine. There's no need to have Solaris builders before then, although of course if we get further into the cycle with no Solaris builders we might reconsider GOOS=solaris entirely (golang.org/wiki/PortingPolicy). But that's just exposing a current problem (no Solaris builders, only Illumos ones), not introducing a new problem.

-rsc for @golang/proposal-review

binarycrusader commented 7 years ago

There's an Oracle Solaris builder in place now and there will be more soon, so this change can be made when appropriate.

ikozhukhov commented 7 years ago

how you want specify how build targets 'solaris' and 'illumos' should be different? i do DilOS (based on illumos) , and i'm interested in details. i try move to use more Debian userland, but it is not clean to me how you want to split 'solaris' and 'illumos' targets? who will be responsible for feature requests? and we have golang-1.8.x and i'm interested in next updates. also, i have Intel & SPARC platforms and i have problems with golang on SPARC - i tried to prepare gcc6-sparc-cross build tools and produce golang builds. i'm able produce binaries, but with some a little updates. it is not golang issue - probably related to gcc team, but i still interested in golang port to DilOS SPARC. i can provide build zones on Intel & SPARC if you are interested in some builds.

bradfitz commented 6 years ago

Moving to Go 1.11, as this apparently didn't happen while I was away on leave.

affixalex commented 6 years ago

I'm generally hesitant to chime in on this sort of thing for fear of bikeshedding, but I didn't see this mentioned here.

I think it would be reasonable to target the Solaris 10 brand. The syscall ABI isn't guaranteed to be backwards compatible in the master branch of Illumos (or Solaris, to my knowledge). The libc is generally considered the compatibility boundary.

https://github.com/joyent/illumos-joyent/blob/master/usr/src/uts/common/brand/solaris10/s10_brand.c

The Solaris 10 brand, however, does have some implicit guarantees about the kernel ABI. These may be spelled out explicitly somewhere, I'm not sure.

From the top of my head, I think this approach would allow binaries to run on both Solaris and Illumos without any loss of functionality.

(As a parenthetical footnote, I think it'd be nice if Illumos had a distinct OSABI and attendant host triples etc in compiler toolchains along with a stable kernel ABI, but the general consensus is that the whole issue is a spectacular troll.)

4ad commented 6 years ago

We're not doing Solaris 10, it lacks APIs we use. Even if we were doing Solaris 10, we'd want to use different APIs on Solaris 11, so Solaris 10 can never be a base for everything.

4ad commented 6 years ago

As for the syscall ABI compatibility, that is a non-concern since Go only uses libc on Solaris variants.

Smithx10 commented 5 years ago

What has to be done still to add this?

bradfitz commented 5 years ago

@Smithx10, it needs ownership. So I don't think this is going to happen anymore. Most of the people who were once behind this have dropped their support.

There's not enough ownership for one Solaris port at the moment, much less two.

Smithx10 commented 5 years ago

@bradfitz What is required by someone in the community to take ownership of the Illumos tag? I don't really have a social life and might be able to take on a hobby.

bcantrill commented 5 years ago

To @Smithx10's point, I think more generally we can find folks in the community to take ownership of this. How do people sign up to help?

bcmills commented 5 years ago

@bcantrill @Smithx10 See https://golang.org/wiki/PortingPolicy.

Smithx10 commented 5 years ago

@bcmills Thank You! I think ATM I'll take @jclulow 's advice and clone the go source, look at all the places that the "solaris" string comes up and add "illumos" versions of that, and get the tests to pass at least as well as they do now. Once we get that I can work on the builder. @bradfitz If the first iteration doesn't have any changes is that ok? I think at this time just having the ground work in place would be a big win. After that, we can start addressing differences, does that make sense?

Toasterson commented 5 years ago

@Smithx10 @bcantrill It was tough enough to get the 3rd Parties to support building on solaris by including the appropriate versions of upstream Libraries that had appropriate cross-platform versions of Flock and MMAP that work under illumos. If we switch everything from "solaris" to "illumos" would that mean that Upstreaming work of the last 3 months becomes useless?

Smithx10 commented 5 years ago

@Toasterson I don't think you have to switch anything. Does adding another tag for illumos force anyone to do anything? Am I missing something? Sorry, I'm new to this particular tag issue and just want to maintain the Nomad agent on Illumos and SmartOS. Please feel free to educate me.

Toasterson commented 5 years ago

@Smithx10 The Main problem will be vendoring. gitea and the mmap library which is used there come to mind. But we Also have termios to redo the tags and see that all vendored copies are updated. Vendoring has become such a often used tool that a change such as this will take a huge amount of time and effort to push through. Even more when people who develop the app mainly only care for Linux.

I just did that whole work for the last 3 months (not fulltime) and I am asking if I have to do the whole thing again.

What is the Problem with the Nomad Agent If I may ask?

jclulow commented 5 years ago

@Toasterson Can you point me at the flock()/mmap() work you're referring to so that I can take a closer look?

I did an illumos port of wireguard-go recently. I was unable to find a nice way to make calls to ioctl(), getmsg(), or putmsg() -- but I was able to create a private set of wrappers using the same pattern used to wrap libc functions in the x/sys/unix package and it seems to work well so far. Perhaps notably, it did not require any use of Cgo.

Smithx10 commented 5 years ago

@Toasterson I just ran into trying to get this merged and realized it's blocked on this issue.

bradfitz commented 5 years ago

We could make GOOS=illumos imply the solaris build tag like we did with GOOS=android implying linux. That would mean most +build solaris-constrained code would still work and people wanting to differentiate could use +build illumos or +build solaris,!illumos.

But we still need people owning the builders for both Solaris & illumos. The Solaris builder owner left the company and it's currently just still ~working by accident, but it's running an old version of the build system. And the one that I kinda run is ill-maintained on Joyent (SmartOS) and if that started being the illumos builder then we'd no longer have a maintained solaris builder.

I was able to run OmniOS CE on qemu-kvm on GCE (in https://github.com/golang/go/issues/15581#issuecomment-435431402) and that's the path (or similar) I'd like to take going forward to run the Solaris (or rather, illumos) builders on GCE where we have ~infinite free quota and fast networking with the rest of our build system.

But that Docker image was so big that I got distracted and started writing https://github.com/google/crfs for better start-up times, which is in progress. Once that's done I'd still need help preparing a script to automate the creation of OmniOS or SmartOS qemu images that run our x/build/cmd/buildlet/stage0 on boot.

jclulow commented 5 years ago

@bradfitz For what it's worth, I had a shot at booting SmartOS under GCE the other day. There were some (uninteresting) panics that I need to work out, but I'm able to drive our kernel debugger (kmdb) through the serial port redirection so I imagine it shouldn't take much to get that sorted out. I embarked on the exercise so that I could then work on an in-OS virtio-scsi driver so that we'd work well under GCE.

If I'm able to get OpenIndiana/OmniOS/SmartOS images to the point where they work directly under the GCE hypervisor, is that our best path forward with respect to builders for the illumos tag?

I'm not really sure what to do about the solaris tag, to be honest; Solaris and illumos have been diverging for many years now, and any resemblance at this point is increasingly coincidental.

Toasterson commented 5 years ago

@jclulow certainly mmap was edsrzf/mmap-go#15 Where I had to push the library to use x/sys/unix. gitea will also still need a push to update as go-gitea/gitea#4799 is still open and that is the main issue vendoring.

flock was at sevlyar/go-daemon#47

These two plus dependents I know will break if we switch to illumos as tag instead of solaris. Another like src-d/go-billy#65 will still work as the tag is posix and not solaris directly.

github.com/pkg/term/termios will also break

We have been using the solaris tag for us for a while now. Out in the open. We will break stuff for us if we switch.

As for ioctl you mean something like https://github.com/pkg/term/blob/aa71e9d9e942418fbb97d80895dcea70efed297c/termios/ioctl_solaris.go#L6 ? That does require cgo but it is cross-compilaeable because libc can be linked somehow. There is some magic here :)

Toasterson commented 5 years ago

@Smithx10 That PR looks odd If you ask me. I don't see why the kstat changes needed to be in files named _illumos. Solaris also has kstat. Especially as it is calling the kstat binary and not using any syscall interface.....

Toasterson commented 5 years ago

@bradfitz Yes that way of tagging would be the way forward. As for GCE. I have recently begun researching what it would take to build an image analogous to the Official FreeBSD Image on the Marketplace, for circleci. Would that be what you are looking for? Also in OpenIndiana we have a Vagrant Box I don't know If a good image conversion tool exists but we use packer to build that box. So there is automation available.

Smithx10 commented 5 years ago

@Toasterson Good idea. I'll take a look and see if I can refactor that to just work under the Solaris tag for now.

4ad commented 5 years ago

I don't have access to Oracle Solaris anymore, but rumors that I don't (or don't want) to maintain the illumos port are greatly exaggerated.

The problem is that running a builder used to be trivial, but now it requires this complex machinery in the cloud. Frankly, the act of running a builder vastly dwarfs in effort and complexity what is required for actually maintaining the port. That's not a great place to be in, IMO.

Doing this change is trivial, but I don't want to do it just yet because I don't want to kill Oracle Solaris support just yet.

I'll be contacting some people to determine the best way to move forward.

bradfitz commented 5 years ago

The problem is that running a builder used to be trivial, but now it requires this complex machinery in the cloud. Frankly, the act of running a builder vastly dwarfs in effort and complexity what is required for actually maintaining the port. That's not a great place to be in, IMO.

That's a little exaggerated. You can still run a builder on a single physical or virtual machine if you want, almost the same as before (the binary name is now buildlet instead of builder). We prefer to cloud-ify them so we have elastic capacity, but that's not a requirement. Our Mac builders aren't cloud-ified, for instance.

bradfitz commented 5 years ago

@jclulow, running directly on GCE isn't a requirement. I'm not asking you or anybody to write a virtio-scsi driver. If you can run on QEMU (which is happy to do virtio-block unlike GCE or IDE/SCSI/etc) and can prepare a VM image for us, we can supply lots of CPU quota. Otherwise you can run it on your own hardware elsewhere.

Toasterson commented 5 years ago

@bradfitz Would a Vagrant image for Quemu/KVM work or what image would you need and how would one prepare that? We have many different images for VM's etc in different formats already.

bradfitz commented 5 years ago

@Toasterson, I don't care which specific tools are used. The only requirement is that it's some automation that runs on Linux that outputs a VM image. We've used bash, Go, expect, Powershell (run remotely on Windows VM), Python (Anita), etc.

But again, the bigger concern here is that if we do this split, then who runs the Oracle Solaris builders?