Closed rsc closed 1 year ago
My Mac Pro (x86) running 13.0.1 has no trouble, with either Go 1.19.3 or tip.
Updating my laptop to 13.0 did not work. It did work - 100 PASS in a row - to add another osinit-time call, this one to xpc_atfork_child. That seems to warm up whatever fast paths it will need in the child to avoid deadlocks. As I write this I've been offered 13.0.1 so I will try that update just to double-check that this is really needed.
Change https://go.dev/cl/451735 mentions this issue: runtime: work around Apple libc bugs to make exec stop hanging
13.0.1 didn't help. Same problem on my M1 MacBook Pro with 12.0.1. go.dev/cl/451735 fixes them both.
We have since discovered that what is unique about my laptop compared to others is that I was running my tests from under a program written and linked against CoreFoundation, which put a magic __CF_USER_TEXT_ENCODING=0x552D:0x0:0x0
variable in my environment. (The 0x552D is my uid.) The os/exec test is linked against CoreFoundation too. If I clear that variable or mangle it to have the wrong uid, causing it to be recomputed, then different things happen at startup and seem to bump things around a bit so that these two fork hangs are far less likely.
We'll keep the fix, it's just even more mysterious now.
I filed the related issue https://github.com/golang/go/issues/33565. I just checked, and I also have this __CF_USER_TEXT_ENCODING
environment variable set in my development environment (plan9port Acme).
This does not tell us anything new but it corroborates what Russ wrote above.
@gopherbot please backport
Backport issue(s) opened: #56836 (for 1.18), #56837 (for 1.19).
Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.
Change https://go.dev/cl/459175 mentions this issue: runtime: revert Apple libc atfork workaround
Change https://go.dev/cl/459176 mentions this issue: runtime: call __fork instead of fork on darwin
Change https://go.dev/cl/459178 mentions this issue: runtime: call __fork instead of fork on darwin
Change https://go.dev/cl/459179 mentions this issue: [release-branch.go1.18] runtime: call __fork instead of fork on darwin
Change https://go.dev/cl/460476 mentions this issue: runtime: Apple libc atfork workaround take 3
There is source code available for the Objective-C runtime on opensource.apple.com which redirects to https://github.com/apple-oss-distributions/objc4
On my x86 Mac laptop using macOS 12.6.1, all.bash often hangs in the os/exec test. In particular, this never finishes:
The chance of a hang in any given iteration is something like 50%. It's possible this is related to #33565, but I'm opening a separate bug just in case, and to focus the discussion on the fact that our own os/exec tests don't pass.
If I attach to the hung process in lldb, I was originally seeing backtraces like:
This specific hang seems to match https://github.com/dart-lang/sdk/issues/29539, and inspection of the Apple libc code shows that the problem is a race with an os_alloc_once that is in progress in the parent when the address space is split, making the same call die in the child. I changed the Go runtime to do an early call to notify_is_valid_token(0) in osinit. That call is a no-op except that it guarantees the os_alloc_once has been done already, so it cannot race with any future forks.
With that fix, I get a different hang:
This one seems to match what @jacobvosmaer posted in https://github.com/golang/go/issues/33565#issuecomment-522674590.
I can't find the libobjc source code so I'm not sure what a workaround for xpc_atfork_child might be.
It must be that C programs on macOS do not use fork. I looked into posix_spawn but it looks like we don't have any other ports that use that.
We need to figure something out for Go 1.20 though.