Closed sio closed 11 months ago
@sio Kudos on getting to the bottom of this! Would be great to see this change merged. @creack, are there any concerns with it?
My main concern would be that some project have been using the old behavior for 10+ years. It is probably best to make this part of the v2 alongside windows support which has other breaking changes.
I'm not sure how deadlocks and data races could be considered a feature someone would rely upon, but then there is always https://xkcd.com/1172/
Jokes aside, I don't mind bumping major version if you think that's required. In my opinion, no existing workflow should be broken by this change because it enables canceling Read() which was not possible before, but it does not cancel anything automatically unless instructed by user. So default behavior stays the same, we just have one less failure mode.
It has been a long long time since, but back when I used it within Docker, I recall relying on the blocking behavior for something.
I may be wrong, but just to be safe and as we plan a v2 anyway, might as well keep the existing behavior for v1 and this PR will be the first part of v2.
That being said, if you are confident I am wrong and chaning the blocking flag of the fd will have no impact, I can push it as part of the last patch of v1.
After more testing, I think you are right. I'll merge this as part of v1.
Thank you for the big contribution!
Thanks for the comments!
Any change has a potential to be a breaking change for someone but I have a few arguments to merge this nonetheless :). FWIW, I am not the author of the PR, I am a user of this library.
Until go1.8 (2017) the IO subsystem used blocking IO for normal files. So calling File.Fd()
didn't affect subsequent IO:
https://cs.opensource.google/go/go/+/refs/tags/go1.8:src/os/file_unix.go;l=43-48
Go1.9 changed file IO use the poller: https://cs.opensource.google/go/go/+/c05b06a12d005f50e4776095a60d6bd9c2c91fac
Now File.Fd()
changed the FD to blocking mode:
https://cs.opensource.google/go/go/+/refs/tags/go1.9:src/os/file_unix.go;l=66-68;bpv=1;bpt=0
This affects subsequent IO operations since the file is never deregistered from the poll subsystem correctly (maybe a bug in Go stdlib?).
So Go1.9 caused this regression.
Update. My eloquence was not needed. Thanks folks for the resolution! Hope to see a minor release soon :)
@sio, this PR seems to have broken ioctls on Linux. Still digging but about to revert the PR.
Simplest example would be the Getsize function always returning 0 and Setsize having no effects.
Edit: Might have been something wrong on my end, can't reproduce the issue. Will not revert.
Several users (me included) were surprised by blocking i/o on ptmx file descriptor.
88
114
162
ptmx.Read()
would hang indefinitely and could not have been interrupted neither by(*os.File).SetDeadline
nor by(*os.File).Close
. This could lead to goroutine leaks (ifRead()
was called in goroutine) or to whole application hanging indefinitely with no way to gracefully unblock.In addition to unwanted freezes, using
(*os.File).Fd()
could lead to data races:156
This PR removes all calls to
(*os.File).Fd()
which would implicitly set file descriptor into blocking mode (go doc, go source). Most such calls were related toioctl()
function, so I added a wrapper that takes(*os.File)
instead of raw fd and uses(*os.File).SyscallConn()
under the hood. The only other place whereFd()
was used is Solaris-specific - I've added aSyscallConn()
workaround there too.There still remain a few places where raw fds are passed around in
pty_darwin.go
/pty_freebsd.go
/pty_solaris.go
, but all of them are limited to pty opening phase and are unlikely be a source of data races (in my understanding).I've also added tests to detect blocked i/o (based on example provided by @gcrtnst in #162). As to be expected, these tests fail on current master branch and succeed after applying this PR.
Caveats:
(*os.File).SyscalConn
fails we fall back to old racy/blocking behaviordarwin
intentionally uses blocking i/o for ptmx (#52, #53, golang/go#22099). I did not touch that, so the tests for unblocking are not executed there. Still, this PR is beneficial to Darwin users because of removal of data races.(*os.File).SyscallConn
was introduced in go1.12. Older platforms will continue using original racy/blocking behavior.