fish-shell / fish-shell

The user-friendly command line shell.
https://fishshell.com
Other
25.73k stars 1.9k forks source link

cat: stdin: Resource temporarily unavailable #176

Closed tyilo closed 10 years ago

tyilo commented 12 years ago

After running number-script I get this error:

$ number-script -i
> 12345
9
> [ctrl-D]
$ cat > test
cat: stdin: Resource temporarily unavailable

This doesn't go away until I restart fish. I have never seen this before, using bash and I did the exactly same thing in bash.

ridiculousfish commented 12 years ago

I get stuck here:

number-script -i

module.js:485
  process.dlopen(filename, module.exports);
          ^
Error: dlopen(/usr/local/lib/node_modules/number-script/node_modules/bignum/build/Release/bignum.node, 1): no suitable image found.  Did find:
    /usr/local/lib/node_modules/number-script/node_modules/bignum/build/Release/bignum.node: mach-o, but wrong architecture
    at Object.Module._extensions..node (module.js:485:11)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Module.require (module.js:362:17)
    at new require (module.js:378:17)
    at Object.<anonymous> (/usr/local/lib/node_modules/number-script/node_modules/bignum/index.js:6:14)
    at Module._compile (module.js:449:26)
    at Object.Module._extensions..js (module.js:467:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
metamatt commented 11 years ago

The same thing happens for me after invoking and then exiting node (no arguments, so it goes to the interactive REPL, and I immediately exit with ctrl-d). This is 100% reproducible for me with both node 0.10.1 on Mac OS X 10.8.3 and node 0.8.16 on Ubuntu 12.04.

I'm using a version of fish I compiled from source obtained in mid February.

This seems exactly as described here except

To summarize:

Here's a simple repro trace:

magi@duality ~/s/o/fish-shell> cat > foo
^Cfish: Job 1, 'cat > foo' terminated by signal SIGINT (Quit request from job control (^C))
magi@duality ~/s/o/fish-shell> node
> ⏎
magi@duality ~/s/o/fish-shell> cat > foo
cat: stdin: Resource temporarily unavailable
magi@duality ~/s/o/fish-shell> bash
bash-3.2$ exit
magi@duality ~/s/o/fish-shell> cat > foo
^Cfish: Job 1, 'cat > foo' terminated by signal SIGINT (Quit request from job control (^C))
metamatt commented 11 years ago

At the risk of stating the obvious, this breaks stdin for more than just cat. The case I've tripped over that I actually care about is git add --interactive, which acts like it sees an immediate EOF, so in the state where cat says stdin: Resource temporarily unavailable, git does this:

magi@duality ~/s/o/fish-shell> git add -i
           staged     unstaged path

*** Commands ***
  1: [s]tatus     2: [u]pdate     3: [r]evert     4: [a]dd untracked
  5: [p]atch      6: [d]iff   7: [q]uit   8: [h]elp
What now> 
Bye.

Here's another one:

magi@ubuntu ~/src> cat > foo
magi@ubuntu ~/src> chmod -w foo
magi@ubuntu ~/src> rm foo
rm: remove write-protected regular empty file `foo'? n
magi@ubuntu ~/src> node
> [ctrl-d, enter to exit]
magi@ubuntu ~/src> rm foo
rm: remove write-protected regular empty file `foo'? rm: error closing file

I don't know the details behind how all programs in the world read from stdin, but I'm trying to establish examples of various different cases and how they interact with fish:

So I haven't found too many interesting examples: node and various shells (tcsh, bash, zsh) run interactively are the only ones with side effects (node in the negative direction, other shells in the positive direction); non-shell interactive programs with a more sophisticated (presumably raw-input or libreadline) use of stdin are always fine even after node, and programs with a simpler (presumably stdio) use of stdin can be broken by whatever node does. I suppose I should go look at how node's interactive tty support works.

One notable thing here is that fish itself does not fix the damage caused by node the way other shells do: if I get the current fish's stdin in a bad state by running and exiting node, then run another (inner) fish, stdin is still broken there, and after exiting it, the outer fish's stdin is still broken.

metamatt commented 11 years ago

Whatever weirdness node is causing seems to be only at process start. That is, invoking node and then exiting it puts stdin in a bad state. But invoking node then suspending it with ctrl-z (which leaves stdin in the bad state) then invoking bash and exiting it (which leaves stdin in the good state) then resuming node with fg then suspending node again with ctrl-z: that leaves stdin still in the good state. So the way that node breaks stdin happens when node starts, but doesn't happen when node is suspended/resumed via SIGTSTP/SIGCONT. This is notable because of https://github.com/joyent/node/issues/3295; the fix for that involves calling essentially tty._setRawMode(false); tty._setRawMode(true);; so those setRawMode calls don't seem to be harmful.

The part of node's startup code that seems to matter is that which runs if the -i flag was passed or if tty.isatty(0) says stdin is a TTY (this is JS code in src/node.js); in that case it does Module.requireRepl().start(opts); where opts might have a terminal = false property if the environment variable NODE_NO_READLINE was set. But setting that doesn't seem to change the behavior wrt breaking stdin.

OK, so the fanciness in node's startup code for interactive mode seems to be limited to instantiating the repl library, and repl is built on readline, and readline is built on tty, and I don't really see any funniness in there other than tty has this raw-mode support and readline uses it with the wrinkle described above.

My thinking-out-loud while trying to understand the node codebase is probably not all that interesting to the fish project, so I'll cease; I was just hoping to find something obviously fishy (sorry, couldn't resist) in the tty handling code, and with a cursory look I haven't been able to.

metamatt commented 11 years ago

Well, one more thing here. It's not necessary to invoke node interactively (so it runs the interactive REPL) or use its repl library at all. Just calling readline.createInterface() and then closing the interface is sufficient to trigger the bad behavior. And in fact it's not necessary to use the readline library (and underneath, the tty library) at all. Just calling process.stdin.resume() (which is something that readline.createInterface() does) is sufficient to trigger the buggy behavior.

So here's a really minimal repro case:

magi@duality ~/src> cat > foo
magi@duality ~/src> node -e 'process.stdin.resume(); process.stdin.pause()'
magi@duality ~/src> cat > foo
cat: stdin: Resource temporarily unavailable
metamatt commented 11 years ago

strace on node -e 'process.stdin.resume(); process.stdin.pause()' shows it doing the following to fd 0:

fcntl(0, F_SETFD, FD_CLOEXEC)           = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, FIONBIO, [1])                  = 0

strace on node -e 'console.log("Hi")' shows it doing the same fcntl, but not the two ioctls.

(To see who's calling these, we have to descend out of node and into libuv. FIONBIO isn't too surprising, and is called by uv__nonblock(). I don't see anything asking for TCGETS, though. But that seems like it ought to be harmless anyway.)

Could this be as simple as setting stdin as nonblocking? Why, yes it could:

magi@ubuntu ~/src> cat > fishy.c
#include <sys/ioctl.h>

void main()
{
   int set = 1;
   ioctl(0, FIONBIO, &set);
}
magi@ubuntu ~/src> gcc fishy.c
magi@ubuntu ~/src> ./a.out
magi@ubuntu ~/src> cat > foo
cat: -: Resource temporarily unavailable
metamatt commented 11 years ago

OK, that's really all this is:

Here's a simple test program

#include <stdio.h>
#include <fcntl.h>
#include <sys/ioctl.h>

int main(int argc, char **argv)
{

   int flags = fcntl(0, F_GETFL, 0);
   printf("stdin was formerly %sblocking\n", (flags & O_NONBLOCK) ? "non" : "");

   int nb = argc < 2 ? 0 : atoi(argv[1]);
   ioctl(0, FIONBIO, &nb);

   flags = fcntl(0, F_GETFL, 0);
   printf("stdin is now %sblocking\n", (flags & O_NONBLOCK) ? "non" : "");
}

And here's the result of using it under fish:

magi@duality /V/m/s/d/experiments> ./a.out 1
stdin was formerly blocking
stdin is now nonblocking
magi@duality /V/m/s/d/experiments> cat > foo
cat: stdin: Resource temporarily unavailable
magi@duality /V/m/s/d/experiments> ./a.out 0
stdin was formerly nonblocking
stdin is now blocking
magi@duality /V/m/s/d/experiments> cat > foo
^Cfish: Job 1, 'cat > foo' terminated by signal SIGINT (Quit request from job control (^C))

(If we use this to set fd 0 to nonblocking mode, cat becomes confused, and the next invocation of the test app shows that fd 0 is still nonblocking, then sets it back to blocking mode, and cat is happy.)

Here's what happens under bash:

bash-3.2$ ./a.out 1
stdin was formerly blocking
stdin is now nonblocking
bash-3.2$ ./a.out
stdin was formerly blocking
stdin is now blocking

(I didn't bother running cat, which is always happy. The important thing here is that when a.out exits and bash returns to the foreground, it sets fd 0 back to blocking mode, because the 2nd invocation of the test app finds it so even though it had left it as nonblocking at the end of the first invocation.)

Another interesting case with bash is

bash-3.2$ ./a.out
stdin was formerly blocking
stdin is now blocking
bash-3.2$ node
> [ctrl-z]
[1]+  Stopped                 node
bash-3.2$ ./a.out
stdin was formerly blocking
stdin is now blocking

(This shows that bash is setting stdin to blocking mode when it it backgrounds node via job control, because we know node left the fd nonblocking, but here a.out finds it blocking.)

I don't know what the precise condition is for "fish regains control of stdin" (comes back to the foreground?) but it seems it should be sanitizing the standard file descriptors at this point, in (at least) this regard.

ridiculousfish commented 11 years ago

Nice diagnosis. Sounds like this can be fixed.

Edit: In retrospect "nice diagnosis" was a vast understatement. This is brilliant.

metamatt commented 11 years ago

I've been playing around with a patch already. My big question is what the old-school shells (e.g. bash) do; maybe I should go look at bash source. My initial intuition is that dup'ing the fd should isolate against this sort of thing, but then I decided that's not true; fork should already be doing as much in that regard as an explicit dup would, and it's not enough, because the ioctl goes through the fd and all the way to the device (so if you have multiple fds to the same device and issue an ioctl through one of them, you're affecting the device itself, and this will be visible on all the fds). But I'm not quite sure if that's true, or if there's a difference between ioctl(0, FIONBIO) vs fcntl(0, F_SETFL) & O_NONBLOCK).

So anyway. If there's a way to achieve some isolation such that child processes don't actually manipulate the parent (shell) state, that seems preferable, but I expect that's already done as much as possible, and not working for this case. In which case the question becomes whether to save/restore exactly this flag, or the entire set of flags visible to fcntl(F_[SG]ETFL), or whether there's some even more thorough sanitization appropriate and necessary, which is where I become curious what other shells have already figured out.

ridiculousfish commented 11 years ago

It looks like none of bash, zsh, and tcsh actually restore the non-blocking status of stdin when resuming a job. I'm surprised that doesn't cause problems in node, etc.

My initial thought was to just always set stdin to blocking before fish reads from it, and perhaps before spawning another job.

metamatt commented 11 years ago

So the status quo for other shells is to clobber stdin back to blocking (and do they reset other flags or device behavior too?) when they get control, and just leave it that way?

In that https://github.com/joyent/node/issues/3295 issue I linked above, Node has explicit code to restore the terminal settings on SIGCONT. I'd have to check whether that includes O_NONBLOCK; offhand it seems completely separate (though you'd care about it for the same reasons).

ridiculousfish commented 11 years ago

In 437b4397b9cf273922ce7b414bf6626845f15ad0 , I made stdin nonblocking when read() returns EWOULDBLOCK, and also before fork and fg. This makes fish match other shells in behavior.

Through experiments, I found that bash only tweaks stdin - it doesn't touch stdout or stderr. In practice these fds seem to be affected anyways, presumably because they all point at the same underlying device. Presumably you could construct a scenario where stdout and stdin are different, and stdout would be made non-blocking; then I believe bash would leave stdout as non-blocking too and node would be affected.

This whole interaction seems not well thought out! Leaving this open to think about it harder; but hopefully the above commit fixes the issue with node.

ridiculousfish commented 11 years ago

Yikes, this commit broke the tests.

ridiculousfish commented 11 years ago

Tests fixed as 42497d99325d5f27a90cc82e4e445c1252f87e7c

metamatt commented 11 years ago

Works for me. Thanks. This will be a nice one to see the last of.

justinmk commented 9 years ago

@metamatt @ridiculousfish I really enjoyed this thread. And it likely saved us days of work on https://github.com/neovim/neovim/issues/2377 (which uses libuv).

What are your thoughts on why bash does not set stdin to blocking between && chains? E.g.:

$ ./a.out 1 && ./a.out 1
stdin was formerly blocking
stdin is now nonblocking
stdin was formerly nonblocking
stdin is now nonblocking

Is this expected?

ridiculousfish commented 9 years ago

Hah, no idea. I can dig a little though since it's intriguing.