crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.42k stars 1.62k forks source link

Wrong output when piping to less #2713

Closed asterite closed 6 years ago

asterite commented 8 years ago

Extracted from Google Groups

A consistent way to reproduce it:

# foo.cr
string = <<-EOF
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789
EOF
puts string

Then:

$ crystal build foo.cr
$ ./foo | less

The issue seems to be related to using fcntl with stdin/stdout/stderr in a shell. I found this but I can't seem to understand what's going on in this case.

miketheman commented 8 years ago

triage

This is an interesting one. I am definitely able to repro off current master 9cf6e06 .

Redirecting stderr to stdout produces the expected result:

$ ./foo 2>&1 | less

(also ./foo |& less - |& was added to Bash 4 as an abbreviation for 2>&1 )

So is there some part of the output that is getting sent to stderr instead of stdout?

miketheman commented 8 years ago

I dug into this further, and using tee, and other character-counters, I can determine that the amount of data transmitted before the "breakage" is 1012 characters on 12 lines = 1024, which is a very suspicious number.

Using the unbuffer command from the expect package, I can get the full output, since this uses an intermediate pty. As mentioned above,

I haven't been able to find the reasoning for 1024 being the buffer limit on OSX, other than some references being based on fcntl kernel default settings. http://lxr.free-electrons.com/source/include/uapi/asm-generic/fcntl.h#L188 http://lxr.free-electrons.com/ident?i=F_SETPIPE_SZ http://man7.org/linux/man-pages/man2/fcntl.2.html

I got lost in digging through how it all ties together...

miketheman commented 8 years ago

Another reference: http://www.pixelbeat.org/programming/stdio_buffering/

chaniks commented 8 years ago

@asterite I am sorry. I was kinda unable to reproduce this problem and thought they were just related. (Still not sure if I understand it right.)

Does this make any change?

In src/io/file_descriptor.cr:

# add in IO::FileDescriptor
def self.blocking(fd)
  fcntl(fd, LibC::F_GETFL) & LibC::O_NONBLOCK == 0
end

In src/kernel.cr:

STDIN  = IO::FileDescriptor.new(0, blocking: IO::FileDescriptor.blocking(0))
STDOUT = (IO::FileDescriptor.new(1, blocking: IO::FileDescriptor.blocking(1))).tap { |f| f.flush_on_newline = true }
STDERR = (IO::FileDescriptor.new(2, blocking: IO::FileDescriptor.blocking(2))).tap { |f| f.flush_on_newline = true }
## comment out following lines
#STDIN  = IO::FileDescriptor.new(0, blocking: LibC.isatty(0) == 0)
#STDOUT = (IO::FileDescriptor.new(1, blocking: LibC.isatty(1) == 0)).tap { |f| f.flush_on_newline = true }
#STDERR = (IO::FileDescriptor.new(2, blocking: LibC.isatty(2) == 0)).tap { |f| f.flush_on_newline = true }

Ahh, now I see what is bugging you guys. I'll dig a little deeper.

asterite commented 8 years ago

The main issue is that we want to have non-blocking IO for stdin/stdout/stderr. For example this should work well:

spawn do
  loop do
    puts 1
    sleep 1
  end
end

spawn do
  loop do
    while line = gets
      puts line
    end
  end
end

sleep

So gets blocks the fiber and the print loop fiber can continue to run. Right now we are using fcntl on stdin/stdout/stderr, but on consoles they seem to be linked to each other, so we set some flags on one descriptor and those flags are also applied to other descriptors, and it causes unwanted behaviour.

asterite commented 8 years ago

Some more links with others running into this problem:

https://twistedmatrix.com/trac/ticket/3442 http://cr.yp.to/unix/nonblock.html

It seems that O_NONBLOCK is a property of file descriptions, not of file descriptors. This means that it's incorrect for any code to ever set it on a file descriptor that it did not just freshly create with open(), socket(), etc.

So setting O_NONBLOCK on stdin/stdout/stderr is wrong.

drosehn commented 7 years ago

It seems that all crystal programs still modify the behavior of STDIO. Even if I compile an empty crystal program (one which is nothing but comments), if I type in mail after running the program, mail acts as if it immediately received an EOF. If I add the line:

STDIN.blocking = true

as mentioned in #3674, then the problem does not happen. @asterite mentioned "So setting O_NONBLOCK on stdin/stdout/stderr is wrong". Does that mean that crystal sets blocking on all three of the standard descriptors?

ysbaddaden commented 7 years ago

Yes. It shouldn't, otherwise accessible standard descriptors would block the event loop.

We must rework IO:: FileDescriptor to handle cases where O_NONBLOCK isn't set on the FD, and rely on timers and signals to avoid long pauses. It's not trivial to fix.

drosehn commented 7 years ago

A little add-on trivia: If you're trying to write a single source-file which works in both Ruby and crystal without any changes, and if this issue causes some grief for you, here's a slight change to the command. This should work-around the issue in crystal without causing any errors in ruby:

  STDIN.blocking = true if STDIN.class != IO

(or at least it worked for me, using Crystal 0.21.1 and macOS)

faustinoaq commented 7 years ago

Some progress on this issue?

Currently I'm using some hacks shown in above comments.

faustinoaq commented 7 years ago

@asterite Do you think we can fix this before next release ? :sweat_smile:

monito

Are some other issues blocking this in particular ?

asterite commented 7 years ago

@faustinoaq To implement this we need to implement STDIN, STDOUT and STDERR as blocking, but when they block, fire a Thread and wait there. So that kinda involves parallelism. And I'm not in charge of that, so I don't think this will be fixed soon.

bew commented 7 years ago

Isn't it possible to reset blocking state of the fd when the program exits?

RX14 commented 7 years ago

Why not just make writing to stdout blocking? It's a simple fix, which has some other benefits. Such as using puts/p/pp doesn't change the execution of the program. If for some reason using blocking stdout is a major performance problem, they can get the current behaviour with 1 line of code. I don't see too many cases where it'd be a problem through.

asterite commented 7 years ago

@RX14 What about this?

spawn do
  loop do
    puts 1
    sleep 1
  end
end

spawn do
  loop do
    while line = gets
      puts line
    end
  end
end

sleep
RX14 commented 7 years ago

What about if it was a busy loop instead of puts? What if stdout and stdin are extremely fast C applications which always empty/fill the buffer faster than we can fill/empty it? What if they're files and never block? Until we have preemptive scheduling we can't solve all these problems, so I suggest we don't try to.

martinos commented 7 years ago

I think this bug prevents me from building or running a crystal program in vim

When I launch the following command !crystal run --no-color % and the current file has compilation errors, the vim display gets polluted and I have to restart vim in order to continue programming.

I have this issue on OSX vim with no configuration file loaded. I cannot reproduce it on Ubuntu.

bew commented 7 years ago

Which version of crystal did you tried? Can you try the current master if you didn't?

martinos commented 7 years ago

I am on

Crystal 0.23.1 (2017-10-12) LLVM 4.0.1

I haven't tested it on the latest version, I will try to do it tonight.

drosehn commented 7 years ago

I think Crystal 0.23.1 tries to avoid this problem by turning blocking back on if it had been on when the crystal-program started (see #5017). In this case, the "crystal-program" is the crystal compiler itself, since your own program was never actually run (due to those compilation errors).

The downside is that if blocking gets turned off by some mistake before the crystal-program runs, then crystal-program will leave it turned off. You might want to compile a crystal program which does nothing but the statement which will turn blocking back on:

STDIN.blocking = true if STDIN.class != IO

Although now that I think about it, I wonder if that will still work after #5017 ?

martinos commented 7 years ago

I tried it with master (007899ac1eebe45043c8350ade85bc62e1783517) and I still have the issue.

drosehn commented 7 years ago

Hmm. I just ran the compiler (Crystal 0.23.1 (2017-10-18) LLVM 4.0.1 on macOS 10.10.5 (Yosemite)), and blocking was turned off. In fact, the only reason that I ran the compiler was to see crystal --version!

But at least in version 0.23.1, running a program which has just the one statement to always set blocking=true does seem to fix blocking on the shell session.

RX14 commented 7 years ago

The PR to fix this is only in master. You need to build a crystal compiler using make crystal then use /path/to/crystal/repo/bin/crystal as your crystal compiler to test this.

martinos commented 6 years ago

@rx14, this is what I did and I still got the issue. https://asciinema.org/a/Ga2Wv2Ans9sm5Juci9fawZRov

faustinoaq commented 6 years ago

Hi I tried 0.24.0, now programs created with crystal don't break the stdin but crystal command itself. I mean:

[crystal-0.24.0-2]$ ./bin/crystal build 'puts "Hello World!!!"'
Hello World!!!
[crystal-0.24.0-2]$ read FOO
works!!!
[crystal-0.24.0-2]$ echo $FOO
works!!!
[crystal-0.24.0-2]$ ./bin/crystal -v
Crystal 0.23.0+313 [ea4187c57] (2017-10-27)

LLVM: 3.9.1
Default target: x86_64-unknown-linux-gnu
[crystal-0.24.0-2]$ read FOO
bash: read: read error: 0: Resource temporarily unavailable
[crystal-0.24.0-2]$

So executable generated by crystal are keeping blocking state, but not crystal command itself:

$ crystal --version
Crystal 0.23.0+313 [ea4187c57] (2017-10-27)

LLVM: 3.9.1
Default target: x86_64-unknown-linux-gnu
$ read
bash: read: read error: 0: Resource temporarily unavailable
$ nano
Too many errors from stdin
$ crystal --help
Usage: crystal [command] [switches] [program file] [--] [arguments]

Command:
    init                     generate a new project
    build                    build an executable
    deps                     install project dependencies
    docs                     generate documentation
    env                      print Crystal environment information
    eval                     eval code from args or standard input
    play                     starts crystal playground server
    run (default)            build and run program
    spec                     build and run specs (in spec directory)
    tool                     run a tool
    help, --help, -h         show this help
    version, --version, -v   show version
$ read
bash: read: read error: 0: Resource temporarily unavailable
$ nano
Too many errors from stdin

As workaround I need to execute crystal eval '' to restore blocking state:

$ ./bin/crystal --version
Crystal 0.23.0+313 [ea4187c57] (2017-10-27)

LLVM: 3.9.1
Default target: x86_64-unknown-linux-gnu
$ read FOO
bash: read: read error: 0: Resource temporarily unavailable
$ ./bin/crystal eval ''
$ read FOO
works again!!!
$ echo $FOO
works again!!!
faustinoaq commented 6 years ago

Interesting on 0.23.1 happens totally opposed issue. I mean crystal command works, but binaries generated by crystal break blocking state:

$ crystal -v
Crystal 0.23.1 (2017-07-17) LLVM 4.0.1
$ read FOO
works!!!
$ echo $FOO
works!!!
$ nano hi.cr
$ cat hi.cr
puts "Hello world!!!"
$ crystal build hi.cr
$ read FOO
works!!!
$ echo $FOO
works!!!
$ ./hi
Hello world!!!
$ read
bash: read: read error: 0: Resource temporarily unavailable

Latest version moved the issue from generated binaries to crystal command itself 😅

faustinoaq commented 6 years ago

Another interesting comment 😄

On zsh this issue never happened:

$ zsh
pc% crystal -v
Crystal 0.23.1 (2017-07-17) LLVM 4.0.1
pc% read FOO
always works in zsh!!!
pc% echo $FOO
always works in zsh!!!
pc% crystal build hi.cr
pc% ./hi 
Hello world!!!
pc% read FOO
always works in zsh!!!
pc% echo $FOO
always works in zsh!!!

Even in latest version:

pc% ./bin/crystal -v
Crystal 0.23.0+313 [ea4187c57] (2017-10-27)

LLVM: 3.9.1
Default target: x86_64-unknown-linux-gnu
pc% read FOO
always works in zsh!!!
pc% echo $FOO
always works in zsh!!!
pc% ./bin/crystal build ~/hi.cr -o hi
pc% ./hi 
Hello world!!!
pc% read FOO
always works in zsh!!!
pc% echo $FOO
always works in zsh!!!
sam0x17 commented 6 years ago

I am also experiencing this on Ubuntu 17.10. e.g.:

$ crystal --version
Crystal 0.24.1 (2017-12-22)

LLVM: 4.0.0
Default target: x86_64-unknown-linux-gnu
$ nano
Too many errors from stdin$
drosehn commented 6 years ago

Hmm. If I do crystal --version on MacOS, the blocking issue does pop up. This does still happen now that I have version 0.24.1 installed. The blocking issue also comes up when I do crystal help. When that happens, the following command worked as a quick way to restore STDIN.

crystal eval 'STDIN.blocking = true'

I tried running a crystal program of my own after compiling that program with version 0.24.1, and blocking was still on (which is TheGoodState) after the program finished. This was true even though that test program did NOT include the fix-it statement of

STDIN.blocking = true if STDIN.class != IO

I also did a crystal run test-prog.cr, and blocking was still on (TheGoodState) after crystal compiled&ran my test program. So the blocking issue seems to be fixed whenever a user's program runs, but not for the compiler itself.

sam0x17 commented 6 years ago

On Ubuntu 17.10, it exhibits this behavior for both user programs and the crystal compiler itself, regardless of whether you use 0.24 or 0.23. Switching to Fish instead of Bash is my current workaround

RX14 commented 6 years ago

@sam0x17 0.24.1 or 0.24.0?

RX14 commented 6 years ago

I think the solution is still clear to me: do what every other application does and never attempt to set STDIN, STDERR or STDOUT as non-blocking. We have lost and continue to loose users because of this, and it's clear the compromise solutions are brittle.

sam0x17 commented 6 years ago

@RX14 0.24.1, and on ubuntu 17.10 I don't even get different behavior between any of the last 3 versions of crystal. Something about 17.10 makes bash much more brittle to this issue it seems.

sam0x17 commented 6 years ago

Side note: have to thank this issue for finally driving me to try something other than bash. Why did I wait so long!?!?!?

sam0x17 commented 6 years ago

Was really hoping this was going to be fixed in the 0.24.2 release but alas no:

blockvue$ crystal --version
Crystal 0.24.2 [4f9ed8d03] (2018-03-08)

LLVM: 4.0.0
Default target: x86_64-unknown-linux-gnu
blockvue$ nano
Too many errors from stdinblockvue$
asterite commented 6 years ago

The fix to this is really hard, it needs parallelism first.

RX14 commented 6 years ago

No it's really not, we just make it always block. I think you're vastly overestimating the number of programs who's behaviour will change. Lets trial the simple solution for 0.25.0 and see if it's acceptable. If not, we can always revert.

martinos commented 6 years ago

My two cents here, the first time that my vim output got screwed up because of this issue, I've lost many hours trying to figure it out.

Because of this bug I cannot compile my any program in vim since every time i get an error i have to restart vim. Personally it turned me off from Crystal even if I think that Crystal is an awesome language.

I am wondering why we don't sync the files at_exit.

drosehn commented 6 years ago

It seems that running the compiler itself will set blocking off, and not reset it on exit? Not the program which the compiler produces, but a simple crystal --version?

drosehn commented 6 years ago

How hard would it be to have all crystal programs (including the compiler!) do something like this on exit (even on error-exits):

do_reset = true
if ENV.has_key?("CRYSTAL_RESET_BLOCKING")
    do_reset = false   if ENV["CRYSTAL_RESET_BLOCKING"] =~ /^(0|no|false)$/i
end
STDIN.blocking = true    if do_reset
bew commented 6 years ago

This is because blocking state is not restored when exit is called. I remember fixing this in #5413, I thought this was merged already.. I can extract it to a new PR for faster merge if needed.

martinos commented 6 years ago

@drosehn, here is my crystal version:

$  crystal -v
Crystal 0.24.2 (2018-03-10)

LLVM: 5.0.1
Default target: x86_64-apple-macosx
asterite commented 6 years ago

I suggest to do what @RX14 says. Better to have something working but not concurrent than something concurrent but broken... At least for the moment.

drosehn commented 6 years ago

I haven't checked Crystal 0.24.2, but I think there is a partial fix in Crystal 0.24.1. But I think there are situations missed by that fix, and that the fix doesn't seem to apply when the compiler itself is running. So a session will have blocking=true, but after the user runs crystal --version then blocking in the session will be turned off.

I suspect the fix also has the potential to trigger other confusion. What if you have one crystal program which (by chance) happens to system()-execute some other crystal program, but after that is done the first program goes on to do additional processing. If the fix is to always turn on blocking at exit, then the second program will cause blocking to be turned on while the first program is still running. Or at least, I wonder what happens in that case. I haven't had the time to investigate it.

If the reset of blocking can be controlled by an environment variable, then there's at least some way to avoid that problem. But of course, all of this is getting messier and messier. A complete and reliable fix will be a lot more work!

bew commented 6 years ago

@sam0x17 @martinos could you try #5802 (you'll need to compile the compiler, and run bin/crystal --version) see if the issue is still there?

martinos commented 6 years ago

@bew , It works !

sam0x17 commented 6 years ago

@bew it works:

bin$ ./crystal --version
Using compiled compiler at `.build/crystal'
Crystal 0.22.0+761 [9e09234f4] (2018-03-10)

LLVM: 4.0.1
Default target: x86_64-pc-linux-gnu
bin$ nano
bin$ 
bew commented 6 years ago

Great news! @asterite @RX14 I think this is a better fix than switching to blocking STDIN|OUT|ERR everywhere.

RX14 commented 6 years ago

I guess it works but restoring state is an ugly hack, and the standards say we shouldn't set nonblocking on a file descriptor we don't own.

sam0x17 commented 6 years ago

More important to get a fix out right now imo -- crystal is basically unusable on systems affected by this which is a huge turn off.