ianlancetaylor commented 2 years ago

Background

This proposal started out as a GitHub discussionn at https://github.com/golang/go/discussions/53060.

Go supports a number of different GOOS/GOARCH targets. We've defined a policy for adding new ports, described at https://go.dev/wiki/PortingPolicy.

Ports are divided into first class ports and secondary ports. The current first class ports are:

darwin/amd64
darwin/arm64
linux/386
linux/amd64
linux/arm
linux/arm64
windows/386
windows/amd64

The current secondary ports are:

aix/ppc64
dragonfly/amd64
freebsd/386
freebsd/amd64
freebsd/arm
freebsd/arm64
illumos/amd64
linux/ppc64
linux/ppc64le
linux/mips
linux/mipsle
linux/mips64
linux/mips64le
linux/riscv64
linux/s390x
android/386
android/amd64
android/arm
android/arm64
ios/arm64
ios/amd64
js/wasm
netbsd/386
netbsd/amd64
netbsd/arm
netbsd/arm64
openbsd/386
openbsd/amd64
openbsd/arm
openbsd/arm64
openbsd/mips64
plan9/386
plan9/amd64
plan9/arm
solaris/amd64
windows/arm
windows/arm64

The core Go team maintains the first class ports. It is less clear how the secondary ports are handled.

The existing porting policy says:

Every secondary port must have a maintainer who will make required updates in a timely manner.
- Every secondary port must have a builder, and a person who maintains that builder.
- If the builder for a port is failing for more than two weeks, a new maintainer is needed.
- If a builder fails for more than four weeks or is failing at the time of a release freeze, and a new maintainer cannot be found, the port will be removed from the tree.

However, in practice we do not follow those rules.

It has not been clear who the secondary port maintainers are, and there was no mechanism for adding or removing maintainers; this was recently improved by creating the @golang/port-maintainers GitHub team and subteams.
When a change requires an update to the port, it is the person making the change who takes responsibility for updating the port, not the port maintainer.
Maintainer responsibilities are in general unclear.
Several secondary port builders are maintained by the Go team.
We do not remove ports from the tree if they are failing for four weeks (we do remove ports if the OS is no longer supported, such as darwin/386).

The effect is that the work required to maintain secondary ports falls on people who are not familiar with those ports. This was not the goal of the porting policy, and it tends to slow down overall development of the core Go systems and discourages the adoption of new secondary ports.

We propose both loosening and tightening the current porting policy to address these concerns.

Proposal

The Go team will stop publishing binaries for secondary ports.
- The Go team currently publishes binaries for the following secondary ports, which we would stop doing if we adopt this part of the proposal:
- freebsd/386
- freebsd/amd64
- linux/ppc64le
- linux/s390x
- windows/arm64
- We encourage port maintainers to publish them instead; we can link to download pages from https://go.dev/dl as appropriate.
- We will investigate building binary releases by cross-compiling to secondary ports from a first-class port. This may be feasible by doing pure Go builds, while leaving any cgo builds to be on the user's machine using the user's C compiler.
- Note: when and if reliable testing hardware is available, the Go team is likely to promote windows/arm64 to be a first class port.
Require at least two maintainers for each secondary port.
- If maintainers resign or become unreachable and can't be replaced, we mark the port as broken; see below.
- This means that for some existing secondary ports we will have to find additional maintainers. As I write this I think that additional maintainers are needed for android, dragonfly, illumos, ios, js/wasm, openbsd, and solaris.
In general developers will not be required to make their code work on every secondary port.
- They are encouraged to do so if it is straightforward.
- If not, they are encouraged to notify port maintainers about pending requirements as soon as feasible, and to work with the port maintainers to get the ports working again.
- That said, ultimately the port maintainers are responsible for keeping their ports working.
Notify maintainers when a port breaks.
- Open an issue, CC'ing maintainers
- Perhaps also send an e-mail to golang-dev.
- We can do this preemptively if we know that some change is going to break a secondary port.
- Ideally this is an automated system, but for now it may be manual.
- For example, perhaps the system that creates https://build.golang.org/ can send e-mails about all failures to a newly created list. Maintainers can filter that list for the ports that they care about.
- Currently maintainers can track #52653 to see problems.
Clarify maintainer responsibilities on the PortingPolicy wiki page.
- A GOOS maintainer is responsible for any _GOOS files, any files with just that build tag, and blocks of code guarded by if runtime.GOOS == "mygoos".
- Similarly for a GOARCH maintainer.
- If technically possible, GOOS/GOARCH maintainers who are not already approvers should be given the right to +2 changes to files for which they are responsible.
- These CLs will still require a +1 review for style and sanity but not for detailed correctness; we will defer to the maintainers for that.
- Note that with two maintainers one of them can +2 changes by the other.
We will introduce a new concept: the broken port.
- The existing policy says that problems must be fixed within 4 weeks, which is not realistic; people are busy.
- If a port stops working, including the case where a builder stops working, we can decide to mark the port as broken.
- Or in some cases we can roll back the change that broke it; this is a judgement call.
- In general, a port can be considered broken if its builder has failed multiple times in a development cycle with a failure mode that does not occur for first class ports, and that failure mode is not believed to have been fixed or suppressed by a change in either a Go repository or the builder's configuration, and maintainers are not actively working on a solution.
- Any approver can declare that a port that meets these criteria is broken.
- The list of broken ports will be maintained in cmd/dist and will appear in the release notes if broken at release time.
- Attempting to build a broken port will fail unless make.bash or go tool dist is invoked with a new option -force.
- If a port is broken in release 1.N, then the core Go team can choose to remove the port from release 1.N+1.
- This is not obligatory and will depend on whether anybody is willing and able to maintain the port going forward.
- We currently have a list of incomplete ports in cmd/dist/build.go; these should probably be treated as broken ports.
- Currently the only entry on that list is linux/sparc64.
- The goal here is not to get ports out of the tree; if people are actively working on the port they should have as much as latitude as possible to fix it.
The default set of trybots will change to only cover first class ports.
- The https://build.golang.org/ page will continue to show all ports.

Discussion

This is not intended to be a big change to the current process. However, it is intended to be a change. It is intended to take some of the porting load off of the core Go team, while making it easier for port maintainers to make changes. It is intended to make it easier to add new ports to the tree.

In the long run it would be good to support out of tree ports. However, that requires a bunch of technical work, and there is no design for it.

erifan commented 2 years ago

In general developers will not be required to make their code work on every secondary port.

Does this mean that we can only consider the first port when adding new features in the future, such as regabi.

And will we move a lot of secondary port related code out of the master branch, so that our code can work without regard to these ports.

rsc commented 2 years ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

ianlancetaylor commented 2 years ago

@erifan

Does this mean that we can only consider the first port when adding new features in the future, such as regabi.

I think it depends on what you mean by "consider." I think an implementation for anything that covers all GOARCH values but differs for each one must as least consider whether and how it can be implemented for each GOARCH. We shouldn't add features that can never work for some GOARCH. And the initial implementation should not break any other GOARCH. But it's OK to implement something like regabi one GOARCH at a time, as indeed is being done for regabi.

And will we move a lot of secondary port related code out of the master branch, so that our code can work without regard to these ports.

I am not proposing that at all. That would be what I call an "out of tree port." I think it would be great to support that, but that is very hard, and it is not this proposal.

erifan commented 2 years ago

Thanks @ianlancetaylor , I see. This sentence "This is not intended to be a big change to the current process. However, it is intended to be a change" is very accurate.

beoran commented 2 years ago

@ianlancetaylor

The core problem with porting Go to other platforms, and making out of tree ports now is that the go runtime, standard library and compiler are not very modular when it comes to operating system and architecture support. Now, packes have files for all platforms confounded are in the same package using build tags. I think this should be refactored and split up into packages per operating system and/or architecture.

One should be able to write a platform support for the Go compiler, runtime and std lib just by writing a module which the compiler , etc. could then import.

If course this would involve a serious refactoring of Go, but I think it will be for the better, and encourage third party ports.

ianlancetaylor commented 2 years ago

@beoran Thanks. I agree with all of that. But this is not that issue. The changes proposed here should not have to wait for the changes that you suggest. The changes you suggest will take a long time to design and implement, and nobody is working on them.

beoran commented 2 years ago

@ianlancetaylor Well I wanted to bring it up but I agree it is a separate issue in the end.

aclements commented 2 years ago

@beoran , we are making baby steps toward better modularization in the runtime by moving the per-OS/arch syscall layer into runtime/internal/syscall and reorganizing it to have significantly less per-arch code. We're part way through migrating the Linux arches (we've migrated the raw syscall functions for all Linux arches, but not yet the Go interface). It's a small step, but should help some.

rsc commented 2 years ago

I think the discussion helped a lot with addressing most of people's concerns, but I'm still a bit surprised about how little discussion is happening here. Does anyone have any remaining objections that we should discuss?

qmuntal commented 2 years ago

Note: when and if reliable testing hardware is available, the Go team is likely to promote windows/arm64 to be a first class port.

Microsoft dev here. We would like to help promoting windows/arm64 to be a first-class port. If it is just a problem of having reliable hardware, we can discuss providing reliable windows/arm64 hosts running in our own infrastructure. Would that be enough? If not, what else would be missing?

ianlancetaylor commented 2 years ago

CC @golang/port-maintainers

ianlancetaylor commented 2 years ago

CC @golang/release See comment by @qmuntal above regarding reliable windows/arm64 hardware. That should probably be discussed on a separate issue. Thanks.

paulzhol commented 2 years ago

..but I'm still a bit surprised about how little discussion is happening here.

I'm not affiliated with any company (just a FreeBSD user). Trying not to offend anyone here but there is not much in a way of open items up for discussion. The core Google Go team doesn't wish to maintain the secondary ports and would like to have it done by other parties. Unfortinetly to me the conditions above look like part of a contract or SLA you'd sign with another company providing development services instead of volunteer members of the open source community.

Additionally maintaining the official (because they are also used to build the Go release binaries) enviornments for freebsd/386, freebsd/amd64 outside of Google is extreamly frustrating. It entails waiting for a member of the release team to run a bunch of bash scripts under a Linux VM to create and deploy new GCE images every time a new OS version is updated. This can sometimes take more than a year due to their limited avilability. As for freebsd/arm, I don't plan to invest in new hardware after already upgrading twice. Maybe I'll be able to get a spare rpi3b to pass the build - but I think there's a huge difference between embedded plaforms with limited RAM and "full" builders which is not addressed by the proposal.

beoran commented 2 years ago

@aclements That's a great first step, and i am all in favor of that. @paulzhol It sounds like maintaining a secondary port is pretty labor intensive and /or inconvenient. Maybe that too is something that should be improved?

ianlancetaylor commented 2 years ago

@paulzhol

The core Google Go team doesn't wish to maintain the secondary ports and would like to have it done by other parties. Unfortinetly to me the conditions above look like part of a contract or SLA you'd sign with another company providing development services instead of volunteer members of the open source community.

You're not wrong. But I think there is another perspective. For a project like Go that has millions of users and that aims to provide a very high level of stability and reliability, it's not fair to our users to say "here is a port that may work, we don't know." We want to say either "this port works to the best of our knowledge and ability" or "you are on your own, good luck." To millions of Go users, we are in fact "another company providing development services." And to treat those users well, we have to carry that attitude through all core Go development. So, yes, we depend on volunteer members of the open source community, but we have to be clear about what they, and the core Go team, can and can't promise to Go's users. Again, you're not wrong, but that doesn't mean that nothing about the current porting policy should change.

I do think that there are things up for discussion, like: will this wind up hurting Go's users and the Go ecosystem? Obviously, I don't think it will, or I wouldn't have proposed it, but I could certainly be making a mistake.

Additionally maintaining the official (because they are also used to build the Go release binaries) enviornments for freebsd/386, freebsd/amd64 outside of Google is extreamly frustrating.

I'm sorry to hear that. I don't know much about what is required, or what would have to change, but I really hope that we can make things better somehow. Separately, I don't mean to be facile but I'm not sure that this proposal affects that one way or the other.

I think there's a huge difference between embedded plaforms with limited RAM and "full" builders which is not addressed by the proposal.

Can you say more about that? Thanks.

beoran commented 2 years ago

@ianlancetaylor If this proposal makes it harder for ports to be available then this will damage the community.

Maybe I am staying the obvious, but would it be possible for Google to invest a bit more in this project and hire a few more porting engineer and provide some more hardware for them? That seems like a more optimal solution.

While this is an organizational issue, the underlying problem also seems to be technical. It seems to me that investing some work in improving the tools for ports would already help to alleviate the work needed to maintain a port. So it seems worth while to also discuss this here.

ianlancetaylor commented 2 years ago

Maybe I am staying the obvious, but would it be possible for Google to invest a bit more in this project and hire a few more porting engineer and provide some more hardware for them? That seems like a more optimal solution.

Unfortunately I can't see that happening.

While this is an organizational issue, the underlying problem also seems to be technical. It seems to me that investing some work in improving the tools for ports would already help to alleviate the work needed to maintain a port. So it seems worth while to also discuss this here.

I think it's worth understanding how we can make it easier to maintain a port.

I'm not sure it makes a difference one way or another to this proposal. What do you think could or should change in the proposal?

beoran commented 2 years ago

@ianlancetaylor With regards to this proposal:

Seeing the current difficulty in maintaining a secondary port. I think the main Go project should provide as many resources as possible.
The main Go project should be a bit more lenient on these secondary ports. While it is true that ideally, all ports should be of equal quality, in practice this doesn't seem to be possible right now. The people who need such secondary ports are likely to accept that they have to do more testing themselves. The value of the secondary ports is in them existing, and allowing users on such systems to keep using Go.

ianlancetaylor commented 2 years ago

Seeing the current difficulty in maintaining a secondary port. I think the main Go project should provide as many resources as possible.

I agree that that sounds good, but I don't know what it really means in practice. For example, I think we would all like it if Google would hire a full time person to do nothing but work on ports. But nothing like that is going to happen. On the other hand, Google is already prepared to donate significant Google Cloud Platform resources to ports, but of course that doesn't really help for ports to GOARCH values that GCP does not support.

One of the things I'm trying to do with this proposal is get more specific. What can we really promise beyond "we'll do our best?" What can people count on us to do beyond "do as much as possible?" What is possible?

The main Go project should be a bit more lenient on these secondary ports.

I'm already trying to do that in this proposal. The proposal is explicitly more lenient than the current porting policy, which says, for example, that a port will be removed if it is broken for four weeks.

tuxillo commented 2 years ago

The main Go project should be a bit more lenient on these secondary ports. While it is true that ideally, all ports should be of equal quality, in practice this doesn't seem to be possible right now. The people who need such secondary ports are likely to accept that they have to do more testing themselves. The value of the secondary ports is in them existing, and allowing users on such systems to keep using Go.

Fully agree here. I, user of one of the systems which would fall into the secondary port category, would greatly benefit of having a "best effort" approach for it rather than a "save yourself" approach because some projects just don't have the required resources to keep up with such effort, which will be (most likely) a big one.

On the other hand, if there are no new hires in the horizon to support the current burden that means that the burnout within the Go team will increase and that surely helps no one either, hence, some changes in the compromise towards those so-called secondary ports will be required. So, it seems things aren't going to get any easier in this front either ...

dmgk commented 2 years ago

I'm generally OK with this proposal, one thing that is somewhat bothering me is

The default set of trybots will change to only cover first class ports.

Does this mean that there will be no usual pre-commit checks performed on secondary port builders? That could easily lead to a situation when innocuously looking commit breaks a secondary port and maintainers will have to fix it after the fact. I'm not sure this will be manageable by volunteers.

ianlancetaylor commented 2 years ago

Does this mean that there will be no usual pre-commit checks performed on secondary port builders? That could easily lead to a situation when innocuously looking commit breaks a secondary port and maintainers will have to fix it after the fact. I'm not sure this will be manageable by volunteers.

That is a fair point. There is some discussion of introducing a submit queue, which would address this problem: submitting a CL would not submit it directly to the repo, but would instead run it through more comprehensive tests and then submit if those tests pass.

Perhaps we should drop or delay that part of the proposal until the submit queue is created.

mengzhuo commented 2 years ago

I'm OK with this proposal. Just a nit: There are ports are maintained by company/full-time developers like arm, ppc, loong, windows. Could we show this information on go.dev/dl and "community support" like freebsd, plan9, riscv ?

paulzhol commented 2 years ago

@beoran

It sounds like maintaining a secondary port is pretty labor intensive and /or inconvenient. Maybe that too is something that should be improved?

Yes, I mainly find myself spinning up and maintaining multiple setups to test my code and verify cgo -godefs output on all the supported FreeBSD GOARCHes before submitting changes. It sometimes can take longer than the actual development. Recently I've learned of TRY=freebsd-386,freebsd-amd64,freebsd-arm64,freebsd-arm. It with mentioned submit queue would help, but it still not a substitute for actual CI machines. Maybe allowing full gomote access could address that.

@ianlancetaylor

I'm sorry to hear that. I don't know much about what is required, or what would have to change, but I really hope that we can make things better somehow. Separately, I don't mean to be facile but I'm not sure that this proposal affects that one way or the other.

The proposal starts off with dropping official releases for freebsd/386, freebsd/amd64. I reasoned that means the port maintainers will have access to some GCE project/credit so we could provide more frequent builder image releases based on the upstream project's GCE images instead of the qemu-inside-Linux being done now and maintained by the release team. I may have misunderstood.

I think there's a huge difference between embedded plaforms with limited RAM and "full" builders which is not addressed by the proposal.

Can you say more about that? Thanks.

These are much more constrained environments. To limit sdcard wear, the root filesystem is usually mounted read-only (at least for my builder). There's much less available RAM (in total and per-core) which is shared between the running Go toolchain and the page cache for both build artifacts and go list walks. On my builder I use spinning rust over iSCSI as scratch workspace and swap - this adds additional load on the networking stack causing network tests to be flaky. Additionally the slow IO coupled with the generally much slower CPU cores introduces behaviors which are dependent on the kernel scheduler and its preemption of user threads in the middle of profiling tests (making those flaky as well). There is a way to address it I think, by splitting the bootstrap+compilation and just running of actual tests on the builder. I think it is/was done on the IOS builder? But maybe it should be the norm for secondary builders on embedded hardware (tier 3 if you will). Preferably without the requirement of each port maintainer to have to roll their own version of it.

beoran commented 2 years ago

@paulzhol Thanks for providing concrete pain points and suggestions. I think it is important they are brought up here.

@ianlancetaylor It kind of worries me that Google does not seem to want to give more resources to the Go team. I hope it is not a bad sign of things to come. Dropping official support for several platforms is not going to help improve the future of Go language, and that is what I fear this proposal may lead to.

As we can see from some replies above, the burden for porting Go is also technical, I would say, even due to technical debt. I stated this before on occasion, but I feel that there should be release of Go, say, next year, with no new features at all, which focuses apart from bug fixes on fixing long standing issues, ease of porting, and lessening this technical debt. After such a release, then with the Go compiler and runtime in an optimal shape, would be a better time to start devolving the secondary ports and advance this proposal.

bcmills commented 2 years ago

Perhaps we should drop or delay that part of the proposal until the submit queue is created.

Even after a submit queue, we would still have to resolve the question of which ports block submission through the queue — and “which ports block progress if broken” is exactly the distinction between first-class and secondary ports (both today and under this proposal).

Moreover, many of the ports most frequently broken post-submit today are the ones that run as reverse builders, which presumably could not be included in a submit queue anyway: if the builder can't be scaled up to the number of commits in flight, then including it in a submit queue would risk blocking the whole queue in the event of builder stalls or outages.

bcmills commented 2 years ago

I could imagine, as an alternative, splitting the TryBots for each CL into two groups: one group of “voting” TryBots (the first class ports plus anything explicitly promoted/added via a TRY= comment), and a second group of “informational” TryBots (the subset of secondary ports that have scalable, sufficiently non-flaky builders).

The “informational” TryBots wouldn't contribute a -1 vote and wouldn't block auto-submit, but could at least provide advance notice of any unexpected breakage in otherwise-stable ports.

4a6f656c commented 2 years ago

The core Google Go team doesn't wish to maintain the secondary ports and would like to have it done by other parties. Unfortinetly to me the conditions above look like part of a contract or SLA you'd sign with another company providing development services instead of volunteer members of the open source community.

You're not wrong. But I think there is another perspective. For a project like Go that has millions of users and that aims to provide a very high level of stability and reliability, it's not fair to our users to say "here is a port that may work, we don't know." We want to say either "this port works to the best of our knowledge and ability" or "you are on your own, good luck." To millions of Go users, we are in fact "another company providing development services." And to treat those users well, we have to carry that attitude through all core Go development. So, yes, we depend on volunteer members of the open source community, but we have to be clear about what they, and the core Go team, can and can't promise to Go's users. Again, you're not wrong, but that doesn't mean that nothing about the current porting policy should change.

This seems to be working on the premise that there are only two levels/options - while stability and reliability is essential for first class ports, and being able to say that "this port works to the best of our knowledge and ability" is not unreasonable for second class ports, there could be considered to be a third class where "we know the code compiles and passes most of the tests, most of the time" is still potentially beneficial to some groups of users. From the Go team's perspective, I would expect them to fully disown any issues with the third class and even some/most of the issues with the second class...

I do think that there are things up for discussion, like: will this wind up hurting Go's users and the Go ecosystem? Obviously, I don't think it will, or I wouldn't have proposed it, but I could certainly be making a mistake.

I think this really depends on how you define the Go's users and the Go ecosystem - if that is defined as only being users of the first class ports, then the answer is "no". But if you define it as being able to run applications that are written in Go on as many platforms as possible (and therefore having the widest possible reach/user basis), then I think the answer is almost certainly "yes".

Many of the second class ports (take all of the *BSDs combined, as an example), allow for a user base that is not Linux/macOS/Windows to build and run a large number of applications (grafana, telegraf, rclone and gitea as some arbitrary examples), that would not be possible otherwise.

4a6f656c commented 2 years ago

Can you say more about that? Thanks.

These are much more constrained environments. To limit sdcard wear, the root filesystem is usually mounted read-only (at least for my builder). There's much less available RAM (in total and per-core) which is shared between the running Go toolchain and the page cache for both build artifacts and go list walks. On my builder I use spinning rust over iSCSI as scratch workspace and swap - this adds additional load on the networking stack causing network tests to be flaky. Additionally the slow IO coupled with the generally much slower CPU cores introduces behaviors which are dependent on the kernel scheduler and its preemption of user threads in the middle of profiling tests (making those flaky as well). There is a way to address it I think, by splitting the bootstrap+compilation and just running of actual tests on the builder. I think it is/was done on the IOS builder? But maybe it should be the norm for secondary builders on embedded hardware (tier 3 if you will). Preferably without the requirement of each port maintainer to have to roll their own version of it.

A number of the builders I run also fall into this category - as an example, the openbsd/mips64 builder has 1GB of RAM and runs on a USB stick. I've taken to running the builder with GOGC=20 - without this the machine ends up in swap during compilation, which slows down what is already a slow process. While building on low resource systems is unlikely to be high on the list of priorities for the Go team (if on the list at all), it is one of the many challenges faced to keep these ports running.

(@paulzhol re sdcard wear, I'm not sure I'd avoid running on a local microsd - three of my builders run with R/W microsd and while I do have occasional burnouts, they're far enough between to not be a real issue... based on experience, there are particular brands/types of cards to use - feel free to reach out if you want further details)

paulzhol commented 2 years ago

@4a6f656c

A number of the builders I run also fall into this category - as an example, the openbsd/mips64 builder has 1GB of RAM and runs on a USB stick. I've taken to running the builder with GOGC=20 - without this the machine ends up in swap during compilation, which slows down what is already a slow process. While building on low resource systems is unlikely to be high on the list of priorities for the Go team (if on the list at all), it is one of the many challenges faced to keep these ports running.

Yes indeed! And as @bcmills mentioned these are not usable in a submit queue having multi-hour build times.

I also think we're both in agreement that there should be more "support" levels between a port which is self hosting, passes all.bash and can be used as a full Go development environment and the bare minimum which is the ability to produce a working running executable on the target GOOS/GOARCH by cross compiling.

Would you not find it beneficial to have a first-class port host dispatch cross compiled tests with internal linking to run a few smoke tests even on the limited slow reverse builders as canaries to detect if a commit breaks that minimum viable set?

4a6f656c commented 2 years ago

Some general comments/observations (rather than trying to address each of the points in the proposal):

Historically, one of the challenges has been the visibility of issues, especially intermittent failures - a number of the secondary ports run on slow/limited resource systems and builds take a reasonable amount of time. There are times (particularly the first month after unfreezing and the couple of weeks after freeze) that the rate of churn is so high that these builders have no hope of keeping up and any issues simply disappear off the build pages. The recent work that @bcmills has been doing to triage issues and raise visibility has been extremely helpful on this front.
The majority of the Go project related work I do (including the ports and builders I maintain) is done on my own time and expense. This means that there can be weeks or even months where I'm unable to keep a close eye on build status or spend time attempting to reproduce and root cause intermittent failures. The proposed changes appear to acknowledge this situation and attempt to take it into account.
While no one wants to block or slow down Go core development, taking the approach that secondary ports can be willfully broken ("they are encouraged to do so" rather than "need to make a reasonable effort") has the potential to create a lot of additional work for maintainers of non-first class ports. Furthermore, once a port is broken it then will hide further failures - I think finding the right balance here is delicate and advance communication is going to be key.
On the communication front, if the approach of breaking a port when adding new functionality (or changing the way Go works) is taken, there really needs to be documentation or communication regarding what needs to be implemented by the port maintainer. As an example, a huge amount of the time and effort spent in bringing up the linux/riscv64 port from Go 1.8 to Go 1.14 was reverse engineering and understanding the breaking changes, before implementing the missing code.
I suspect that there are ways in which better collaboration and communication can benefit both the developers making changes to Go core and the secondary port maintainers - over the years I can only recall a couple of instances where I've been asked for input or feedback relating to a specific port (although this obviously comes back to some of the "who is responsible" issues, as well as the time permitting aspect mentioned earlier).
It seems worth identifying the difference between breaking a GOOS and breaking a GOARCH - breaking the openbsd/arm64 port because a new syscall is required by the runtime is likely to be a fairly quick and easy fix, however breaking the linux/riscv64 port because the compiler or linker changed and no longer works for riscv64 is likely to be a much bigger lift. Does the "they are encouraged to do so" apply equally to both GOOS and GOARCH?
The definition of "broken port" still seems to be somewhat vague - one end of the spectrum is clearly obvious ("it fails to build" and/or "the same mandatory test fails on every run"), however intermittent failures and unreliability is much harder to define (especially given that some of this can be related to the operating system, the hardware or the build environment, rather than Go itself). While in an ideal world all of the tests pass on all ports all of the time, in reality a Go port can still be useful to users even if a non-critical test is flaky.

ianlancetaylor commented 2 years ago

@beoran

It kind of worries me that Google does not seem to want to give more resources to the Go team. I hope it is not a bad sign of things to come. Dropping official support for several platforms is not going to help improve the future of Go language, and that is what I fear this proposal may lead to.

But this proposal is not dropping official support for any platform. It proposes that we may stop producing binaries for several platforms, but the support level of those platforms is not changing. And the proposal suggests that we should instead start producing binaries for all platforms, but that too won't change the support level.

This isn't just a pedantic point. The goal of this proposal is to better reflect the reality of how the ports work today. If people thought that the presence of the linux/s390x binaries on the download page meant that linux/s390x was better supported than, say, linux/mips64, then those people were being misled.

As far as Google's support for Go goes, Google is a business. Google will invest where the users are. According to the Go survey the vast majority of Go deployments are on Linux, Windows, and macOS. Even Wasm, which is barely supported at all, is at the same level as the BSD/Solaris/AIX platforms, although Unix platforms are much better supported than Wasm. It's a hard sell to Google that the company should invest in support in areas where there aren't many potential users. (And I don't think that better Go support for any platform will be the tipping point that causes a Go user to adopt that platform.) I'm not saying this to argue that Go should drop support for BSD, etc. I personally think that we should support all platforms. I'm saying this to explain why Google is not going to invest in this area. It doesn't mean that Google doesn't support Go.

ianlancetaylor commented 2 years ago

@4a6f656c

This seems to be working on the premise that there are only two levels/options - while stability and reliability is essential for first class ports, and being able to say that "this port works to the best of our knowledge and ability" is not unreasonable for second class ports, there could be considered to be a third class where "we know the code compiles and passes most of the tests, most of the time" is still potentially beneficial to some groups of users. From the Go team's perspective, I would expect them to fully disown any issues with the third class and even some/most of the issues with the second class...

To me that kind of sources like first class ports, secondary ports, and broken ports.

Do you have any suggestions for what should change in the proposal text to support what you are looking for? Thanks.

4a6f656c commented 2 years ago

Would you not find it beneficial to have a first-class port host dispatch cross compiled tests with internal linking to run a few smoke tests even on the limited slow reverse builders as canaries to detect if a commit breaks that minimum viable set?

@paulzhol definitely - I believe this is currently the case for OpenBSD, where the various GOOS/GOARCH combinations are at least cross compiled on a fast builder on GCP (I think this is also the case for other platforms). I suspect this has prevented a lot of breakage over the years, where there is a quick/simple fix. It would certainly be a concern to see that disappear.

4a6f656c commented 2 years ago

@4a6f656c

This seems to be working on the premise that there are only two levels/options - while stability and reliability is essential for first class ports, and being able to say that "this port works to the best of our knowledge and ability" is not unreasonable for second class ports, there could be considered to be a third class where "we know the code compiles and passes most of the tests, most of the time" is still potentially beneficial to some groups of users. From the Go team's perspective, I would expect them to fully disown any issues with the third class and even some/most of the issues with the second class...

To me that kind of sources like first class ports, secondary ports, and broken ports.

Do you have any suggestions for what should change in the proposal text to support what you are looking for? Thanks.

My observation is simply that a tertiary port class could exist (with a rather low bar - it must compile and pass some subset of tests), separate from broken ports.

ianlancetaylor commented 2 years ago

@paulzhol @4a6f656c My understanding is that we already do support building the tests on one platform and testing them on another. Certainly that's how we run Android tests, for example. Can you give open an issue with an example of a builder that doesn't work that way today, but should? Perhaps we can use that to make this functionality easier to use or better documented.

Perhaps it would be helpful to have "guide to builder owners", though it's quite likely that the best person to write that guide would be somebody not on the Go team.

paulzhol commented 2 years ago

@paulzhol @4a6f656c My understanding is that we already do support building the tests on one platform and testing them on another. Certainly that's how we run Android tests, for example. Can you give open an issue with an example of a builder that doesn't work that way today, but should?

My builder (freebsd-arm-paulzhol) doesn't do that, bootstraping fully from C based go1.4: https://build.golang.org/log/829f78d911467cce45f98b6a6332a4a3787182d4

Building Go cmd/dist using /usr/home/paulzhol/go1.4. (go1.4-bootstrap-20170531 freebsd/arm)
Building Go toolchain1 using /usr/home/paulzhol/go1.4.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Building packages and commands for freebsd/arm.
---
Installed Go for freebsd/arm in /tmp/workdir-host-freebsd-arm-paulzhol/go
Installed commands in /tmp/workdir-host-freebsd-arm-paulzhol/go/bin

##### Test execution environment.
# GOARCH: arm
# CPU: 
# GOOS: freebsd
# OS Version: FreeBSD 13.0-RELEASE-p3 FreeBSD 13.0-RELEASE-p3 #1 releng/13.0-b368bb75b-dirty: Sat Jul  3 15:38:07 IDT 2021     root@nexus:/usr/obj/embedded/obj/usr/src/arm.armv7/sys/VIRT-YVL  arm

Nor openbsd-arm-jsing https://build.golang.org/log/c60ca94818edb876da915a76425435bedbbffb78

Building Go cmd/dist using /usr/local/go. (go1.18.1 openbsd/arm)
Building Go toolchain1 using /usr/local/go.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Building packages and commands for openbsd/arm.
---
Installed Go for openbsd/arm in /home/gopher/build/go
Installed commands in /home/gopher/build/go/bin

##### Test execution environment.
# GOARCH: arm
# CPU: 
# GOOS: openbsd
# OS Version: OpenBSD 7.0 OpenBSD 7.0 (GENERIC) #80407: Sun Oct  3 04:05:16 MDT 2021     deraadt@armv7.openbsd.org:/usr/src/sys/arch/armv7/compile/GENERIC  armv7

not sure about netbsd-arm, but plan9-arm builds by itself: https://build.golang.org/log/e36769b1db682b10099b3577387a1546abec7bc4

Building Go cmd/dist using /sys/lib/go1.17
Building Go toolchain1 using /sys/lib/go1.17.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Building packages and commands for plan9/arm.
---
Installed Go for plan9/arm in /boot/workdir/go
Installed commands in /boot/workdir/go/bin
*** You need to bind /boot/workdir/go/bin before /bin.

##### Test execution environment.
# GOARCH: arm
# CPU: 
# GOOS: plan9
# OS Version: 2000

4a6f656c commented 2 years ago

Would you not find it beneficial to have a first-class port host dispatch cross compiled tests with internal linking to run a few smoke tests even on the limited slow reverse builders as canaries to detect if a commit breaks that minimum viable set?

@paulzhol definitely - I believe this is currently the case for OpenBSD, where the various GOOS/GOARCH combinations are at least cross compiled on a fast builder on GCP (I think this is also the case for other platforms). I suspect this has prevented a lot of breakage over the years, where there is a quick/simple fix. It would certainly be a concern to see that disappear.

@paulzhol sorry, I misread your question - cross compilation occurs to ensure that the code compiles, but no cross compiled tests are dispatched and run on the reverse builders. I agree that smoke tests that are cross compiled on a fast builder and deployed/invoked on the reverse host would certainly be a way of speeding up the "does it work" part for TRYBOTs without requiring the time for full builds to complete.

bcmills commented 2 years ago

@4a6f656c

While in an ideal world all of the tests pass on all ports all of the time, in reality a Go port can still be useful to users even if a non-critical test is flaky.

That is true, but someone still needs to do the work to determine whether the test is truly “non-critical”, and to separate the non-critical test failures from regressions in more important tests. When the failure mode is platform-independent that work usually falls to the Go team (particularly the owner of the affected package), but if the failure mode is specific to a port then it is arguably the port maintainer's responsibility.

I added a page with much more detail about triaging and addressing test failures at https://go.dev/wiki/TestFailures; note that the suggested actions under “Addressing a test failure” include “deprioritize … by skipping the failure on affected platforms”, for precisely these kinds of non-critical bugs. However, someone still needs to do the work to add the skip — if no one does at least that much, it creates more work in the triage process to filter out the flakes.

―

Trickier are the cases where it is not a specific test that is flaky but rather the port overall — say, a small-but-nonzero chance of any given Go program deadlocking or experiencing memory corruption. We have had several of those in the past and have a few ongoing now, and (unlike with flakes in a specific test) there isn't a good way to just ignore them — they can be tedious to filter out during triage (especially given our existing tooling) and can often mask other regressions.

(For example, a failure manifesting as a deadlock can mask other deadlocks, and a failure manifesting as OOMs can mask other increases in memory consumption.)

beoran commented 2 years ago

@ianlancetaylor I understand that the point if this proposal is to make the process match reality. It's just that I don't like the current reality, so I am arguing we should also improve that as much as possible in addition to this proposal.

Thanks for explaining Google's point of view on this but it seems like a very unfortunate example of "corporate thinking". In the FLOSS community there is a lot of value in the "long tail". https://www.investopedia.com/terms/l/long-tail.asp .

I like to use go for games and Gui applications, but Go is not very popular for that right now, but I believe it could be. If Go were to drop, for example, mobile platforms because of this corporate thinking, then it becomes less attractive as a general use programming language, and risks becoming a "server only" language like, say PHP. I don't think anyone here would like that to happen.

laboger commented 2 years ago

Is the plan to stop providing golang binaries for Go 1.19 and beyond? What about the binaries that already exist, can they stay?

ianlancetaylor commented 2 years ago

Is the plan to stop providing golang binaries for Go 1.19 and beyond?

I'm hopeful that we can provide binaries using cross-compilation.

What about the binaries that already exist, can they stay?

I suppose so.

ianlancetaylor commented 2 years ago

To be clear I don't know what we will do for binaries for 1.19, assuming this proposal is accepted. Nobody is yet working on the cross-compilation approach.

laboger commented 2 years ago

Is the plan to stop providing golang binaries for Go 1.19 and beyond?

I'm hopeful that we can provide binaries using cross-compilation.

What about the binaries that already exist, can they stay?

I suppose so.

The biggest impact is going to be all the scripts and dockerfiles that currently use the golang.org/dl location to obtain toolchain binaries that will have to change. If cross-compiled non-cgo binaries will be provided instead I expect they will be named differently since they aren't the same as before. Otherwise users will get a non-cgo toolchain when they've always gotten cgo and some builds will fail if they expect cgo.

Is the main concern to avoid doing builds on a ppc64le machines? If so, it is possible to build a CGO_ENABLED cross toolchain using a cross gcc? I haven't tried that but it seems like it should be possible.

As far as the other points of the proposal, we are fine with fixing issues that come up on ppc64le/ppc64 due to other changes but hopefully there will be timely communication when bigger changes are going in that are likely to affect other targets.

To be clear I don't know what we will do for binaries for 1.19, assuming this proposal is accepted. Nobody is yet working on the cross-compilation approach.

OK

ianlancetaylor commented 2 years ago

If cross-compiled non-cgo binaries will be provided instead I expect they will be named differently since they aren't the same as before. Otherwise users will get a non-cgo toolchain when they've always gotten cgo and some builds will fail if they expect cgo.

If we do this, we'll provide toolchains that use cgo by default but that rely on the host C compiler to complete the cgo build. I think the only significant difference would be that cmd/go will use Go DNS lookups rather than cgo lookups.

bsiegert commented 2 years ago

@paulzhol All the NetBSD builders build their binaries on the host of the given architecture. If there are benefits to running differently, e.g. to build the binaries on a faster host and only running the tests on the hosts, I am open to changes.

paulzhol commented 2 years ago

@bsiegert Yes I think there are several: faster build times (no need to bootstrap on slowwww and memory constrained embedded ports like arm, mips, ppc etc). Plus the possibility of participating in the build queues before changes are merged.

But in the context of this proposal I think that there should be a third or fourth type of port (first class, secondary, broken and embedded). Where the embedded type should be required just to pass a (subset) of tests and not a full blown run of ./all.bash as to not be marked as broken. And that we transition the current embedded ports (first class and secondary) to this mode.

laboger commented 2 years ago

If we do this, we'll provide toolchains that use cgo by default but that rely on the host C compiler to complete the cgo build. I think the only significant difference would be that cmd/go will use Go DNS lookups rather than cgo lookups.

@ianlancetaylor Can you explain what you mean by "rely on the host C compiler to complete the cgo build". The cgo build needs to be done with a compiler that can target ppc64le, so that could be a ppc64le cross compiler on the x86 host (build machine), or are you suggesting something else?

heschi commented 2 years ago

The idea would be to build toolchain that didn't use cgo itself, but had cgo enabled by default. That wouldn't require a cross-compiler to build, so shipping the binaries would be easy. Once installed, the toolchain would build the cgo-enabled stdlib using the host's C compiler when it first built a binary.

laboger commented 2 years ago

The idea would be to build toolchain that didn't use cgo itself, but had cgo enabled by default. That wouldn't require a cross-compiler to build, so shipping the binaries would be easy. Once installed, the toolchain would build the cgo-enabled stdlib using the host's C compiler when it first built a binary.

I don't see any documentation that mentions this behavior. Currently the value of CGO_ENABLED affects both the build of the tolchain and the setting of cgo within the toolchain when it is used to build something. Is there a way to make this work now, or is this a proposed change? I think this would be a great solution if you could continue to provide the toolchain binaries and they can be downloaded and used as they are now without any changes to user scripts or Dockerfiles.

Also in my experiments to verify how this works I found that the path to the interpreter/loader is not set correctly. It uses the loader from the x86 build machine not the target. Setting GO_LDSO when building the cross toolchain seems to fix it. I will open an issue on that.

golang / go

clarify Go support policy for secondary ports #53383

Background

Proposal

Discussion