att / ast

AST - AT&T Software Technology
Eclipse Public License 1.0
561 stars 152 forks source link

Use real pipes instead of sockets #462

Open siteshwar opened 6 years ago

siteshwar commented 6 years ago

Ksh uses sockets instead of real pipes to implement pipes in shell. This has caused issues on multiple occasions. For e.g.

$ cat /etc/passwd | head -1 /dev/stdin
head: cannot open '/dev/stdin' for reading: No such device or address

I tried removing this code. It fixes above issue, but caused some of the io tests to fail. Switching to real pipes may cause other regressions too. We should do a deeper analysis of how this is going to affect scripts.

EDIT: Updated description.

DavidMorano commented 6 years ago

Sorry for double posting. (my mistake).

Of course some of you remember why KSH had to use sockets on some platforms: KSH needs to occasionally read only up to a new-line character and so it needs to PEEK into the input byte stream. Although older SysV type UNIXi allow for PEEKs on pipes, many newer systems (Linux?) do not allow for that (PEEK's on pipes). So KSH has to resort to sockets to get the PEEK capability on platforms that do not support PEEK for pipes. What KSH does now-a-days, exactly, on each platform, I do not know. I would like to think that it only resorts to sockets for shell "pipes" when needed, but I do not know if this is the case any longer (it might be using sockets on all platforms now, for all I know).

kernigh commented 6 years ago
$ (print -n foo; sleep 1; print -n bar) |read -n6 temp; echo $temp

The io test expects foo, not foobar. I believe that foo is correct, because the ksh(1) manual says,

       read [ -ACSprsv ] [ -d delim] [ -n n] [ -N n] [ -m method] [ -t
       timeout] [ -u unit] [ vname?prompt ] [ vname ... ]
              ... The -n option causes at most n bytes to
              read rather a full line but will return when reading from a slow
              device as soon as any characters have been read.  The -N option
              causes exactly n to be read unless an end-of-file has been
              encountered or the read times out because of the -t option....

This means that read -n in ksh has the same semantics as io.BufferedReader.read1 in Python, or IO#readpartial in Ruby.

$ slow() { print -n foo; sleep 1; print -n bar; }
$ slow | (read -n6 x; print $x)
foo
$ slow | python3 -c 'import sys; print(sys.stdin.buffer.read1(6))'
b'foo'
$ slow | ruby25 -e 'p STDIN.readpartial(6)'
"foo"

For contrast, read -N in ksh has the same semantics as read in Python or Ruby.

$ slow | (read -N6 x; print $x)
foobar
$ slow | python3 -c 'import sys; print(sys.stdin.buffer.read(6))'
b'foobar'
$ slow | ruby25 -e 'p STDIN.read(6)'
"foobar"
zakukai commented 6 years ago

You can blame Linux for /dev/stdin not working right. Someone thought there was no need to make /dev/stdin, /dev/fd, etc work for sockets. The way the feature works on Linux seems a bit of a mess anyway: if the file descriptor in question is anything but a pipe, the file is reopened rather than dup'ed.

Other than compatibility with Linux's broken /dev/fd implementation I don't think there's any compelling reason not to use sockets. There's nothing inherently virtuous about "real" pipes, and sockets provide a bunch of APIs that aren't available for pipes. As mentioned in the discussion for the other issue, one of these is the ability to peek at the socket buffer, which ksh takes advantage of.

siteshwar commented 6 years ago

@kernigh You are right. I misinterpreted the expected test output. Also, the behavior does not change with locales but with using -N option instead of -n.

siteshwar commented 6 years ago

As @DavidMorano mentioned, ksh peeks into file descriptors before reading data. It seems it is being used to determine slow devices. When read -n is used, ksh tries to peek into input fd and if fd does not have any data, ksh does not invoke read. Pipes on Linux do not support peeking, so switching to real pipes is causing some test cases to fail.

Now the question is shall we switch to real pipes for saner behavior ? This may affect performance and features like returning early while reading from slow devices.

siteshwar commented 6 years ago

@zakukai This issue was reported earlier to kernel people, but they do not consider it a bug. You can read related message here.

zakukai commented 6 years ago

Indeed. This is why I said they felt no need to make it work. /dev/stdin, etc. are used primarily as a shell programming convention but it's an awkward one in that it depends on OS support to make it work. The situation is also kind of messed up in that it's seen as a security issue, but the main use case is is just a convenience notation for dup() to allow a process to open a file that it already has open. That kind of makes the security concerns irrelevant - or at least it would if the implementation took this into account.

I don't think it's worth switching to pipes just to support that. Though on the flip side I understand that from a user's point of view it looks like it's ksh that's broken, and not Linux.

siteshwar commented 6 years ago

fwiw cat /etc/passwd | head -1 /dev/stdin works as expected in the last beta release because head is enabled as builtin by default.

kernigh commented 6 years ago

I think it's a bug that read -n doesn't work with a real pipe, for example if I use dash to make the pipe. Notice the difference in the output below: read -n6 reads "foobar" from dash's pipe but "foo" from ksh's socket.

$ PS1='dash$ ' dash
dash$ slow() { echo -n foo; sleep 1; echo -n bar; }
dash$ slow | python3 -c 'import sys; print(sys.stdin.buffer.read1(6))'
b'foo'
dash$ slow | src/cmd/ksh93/ksh -c 'read -n6 temp; echo $temp'
foobar
$ PS1='ksh$ ' ENV=/./ src/cmd/ksh93/ksh
ksh$ slow() { echo -n foo; sleep 1; echo -n bar; }
ksh$ slow | python3 -c 'import sys; print(sys.stdin.buffer.read1(6))'
b'foo'
ksh$ slow | src/cmd/ksh93/ksh -c 'read -n6 temp; echo $temp'
foo
DavidMorano commented 6 years ago

@kernigh

I think it's a bug that read -n doesn't work with a real pipe, for example if I use dash to make the pipe. Notice the difference in the output below: read -n6 reads "foobar" from dash's pipe but "foo" from ksh's socket.

Yes, I think it is a bug (of a sort) also! But the situation is that some OSes, mostly those that did not follow the flavor of System V Release 3 (and following w/ SysV Release 4), -- like Linux! for example -- do not allow for "peeking" on pipes. As far as I remember, peeking in general and peeking on pipes was started out of research (from Research UNIX Eighth Edition) and into the commercial space w/ the introduction of STREAMS to UNIX starting w/ SysV R3. Although Linux added STREAMS some time ago (but now really gone altogether), they never brought over peeking on pipes. Yes, many of us have suffered (greatly I might add) due to this deficiency in Linux!

kernigh commented 6 years ago

I'm running OpenBSD and have the same problem -- it has MSG_PEEK for sockets, but doesn't have I_PEEK for pipes.

krader1961 commented 6 years ago

I think it's a bug that read -n doesn't work with a real pipe...

I agree with @kernigh. Consider a ksh process that reads from a pipe (named or anonymous) setup by the process that spawned ksh on the read side of the pipe. Since that has to work correctly (and if it doesn't it needs to be fixed) there is no good reason for ksh to use sockets for the "pipes" it creates when building a pipeline. Yes, sockets have other pros compared to pipes. They can usually buffer more data. On some platforms they're faster. But I am skeptical either argument justifies the added code complexity from having to support both pipes and sockets.

saper commented 6 years ago

I think read -n6 behavior displayed here is correct? It turns out to be a slow device therefore it gives what is there, as documented in the manual. One can be more precise about "slowness" by giving -N and -t timeout instead.

krader1961 commented 5 years ago

I see issue #1147 I opened a day ago is essentially a duplicate of this issue. The issue I opened noted that preprocessor symbol _pipe_socketpair has to be true otherwise the io.sh unit test fails. In other words, it is impossible to build ksh that behaves correctly on a system without unix domain sockets. That is not acceptable. Too, as I noted above ksh behaves incorrectly if it is on the right-hand side of a regular pipe such as might happen if a ksh script is spawned in a pipeline by a non-ksh process.

krader1961 commented 5 years ago

I've been working to make all unit tests pass on Cygwin (we're now down to 7 failures from 35 just a few weeks ago). But the object returned by socketpair() on Cygwin does not behave as ksh expects. So to solve a bunch of more basic failures I had to configure the Cygwin build to use pipe() instead of socketpair(). That still leaves reading specific numbers of chars from pipes being broken among other problems. If I were setting priorities I'd make fixing the code to work correctly with pipe() objects a high priority. Possibly also just eliminating the socketpair() support since it is not clear the performance optimization it provides justifies the increased complexity.