creack / pty

PTY interface for Go
https://pkg.go.dev/github.com/creack/pty?tab=doc
MIT License
1.68k stars 234 forks source link

Avoid calls to (*os.File).Fd() and operations on raw file descriptor ints #167

Closed sio closed 11 months ago

sio commented 1 year ago

Several users (me included) were surprised by blocking i/o on ptmx file descriptor.

ptmx.Read() would hang indefinitely and could not have been interrupted neither by (*os.File).SetDeadline nor by (*os.File).Close. This could lead to goroutine leaks (if Read() was called in goroutine) or to whole application hanging indefinitely with no way to gracefully unblock.

In addition to unwanted freezes, using (*os.File).Fd() could lead to data races:

This PR removes all calls to (*os.File).Fd() which would implicitly set file descriptor into blocking mode (go doc, go source). Most such calls were related to ioctl() function, so I added a wrapper that takes (*os.File) instead of raw fd and uses (*os.File).SyscallConn() under the hood. The only other place where Fd() was used is Solaris-specific - I've added a SyscallConn() workaround there too.

There still remain a few places where raw fds are passed around in pty_darwin.go/pty_freebsd.go/pty_solaris.go, but all of them are limited to pty opening phase and are unlikely be a source of data races (in my understanding).

I've also added tests to detect blocked i/o (based on example provided by @gcrtnst in #162). As to be expected, these tests fail on current master branch and succeed after applying this PR.

Caveats:

nichtverstehen commented 1 year ago

@sio Kudos on getting to the bottom of this! Would be great to see this change merged. @creack, are there any concerns with it?

creack commented 11 months ago

My main concern would be that some project have been using the old behavior for 10+ years. It is probably best to make this part of the v2 alongside windows support which has other breaking changes.

sio commented 11 months ago

I'm not sure how deadlocks and data races could be considered a feature someone would rely upon, but then there is always https://xkcd.com/1172/

Jokes aside, I don't mind bumping major version if you think that's required. In my opinion, no existing workflow should be broken by this change because it enables canceling Read() which was not possible before, but it does not cancel anything automatically unless instructed by user. So default behavior stays the same, we just have one less failure mode.

creack commented 11 months ago

It has been a long long time since, but back when I used it within Docker, I recall relying on the blocking behavior for something.

I may be wrong, but just to be safe and as we plan a v2 anyway, might as well keep the existing behavior for v1 and this PR will be the first part of v2.

That being said, if you are confident I am wrong and chaning the blocking flag of the fd will have no impact, I can push it as part of the last patch of v1.

creack commented 11 months ago

After more testing, I think you are right. I'll merge this as part of v1.

creack commented 11 months ago

Thank you for the big contribution!

nichtverstehen commented 11 months ago

Thanks for the comments!

Any change has a potential to be a breaking change for someone but I have a few arguments to merge this nonetheless :). FWIW, I am not the author of the PR, I am a user of this library.

  1. The issue is quite a subtle one, so many users might not realize that they are affected. In fact, we as users didn't realize for a long time (we were just silently leaking processes).
  2. A singnificant number of open issues has this as the root cause.
  3. The behavior actually has changed since 10 years ago!

Until go1.8 (2017) the IO subsystem used blocking IO for normal files. So calling File.Fd() didn't affect subsequent IO: https://cs.opensource.google/go/go/+/refs/tags/go1.8:src/os/file_unix.go;l=43-48

Go1.9 changed file IO use the poller: https://cs.opensource.google/go/go/+/c05b06a12d005f50e4776095a60d6bd9c2c91fac

Now File.Fd() changed the FD to blocking mode: https://cs.opensource.google/go/go/+/refs/tags/go1.9:src/os/file_unix.go;l=66-68;bpv=1;bpt=0

This affects subsequent IO operations since the file is never deregistered from the poll subsystem correctly (maybe a bug in Go stdlib?).

So Go1.9 caused this regression.

Update. My eloquence was not needed. Thanks folks for the resolution! Hope to see a minor release soon :)

creack commented 11 months ago

@sio, this PR seems to have broken ioctls on Linux. Still digging but about to revert the PR.

Simplest example would be the Getsize function always returning 0 and Setsize having no effects.

Edit: Might have been something wrong on my end, can't reproduce the issue. Will not revert.