HaxeFoundation / haxe

Haxe - The Cross-Platform Toolkit
https://haxe.org
6.14k stars 654 forks source link

Sys.print duplicates CR on Windows #8379

Open Aurel300 opened 5 years ago

Aurel300 commented 5 years ago
class Print {
  public static function main():Void
    Sys.print("foo\r\n");
}

On OS X:

$ haxe --run Print | xxd
0000000: 666f 6f0d 0a                             foo..

On Windows:

$ haxe --run Print | hexdump -C
00000000 66 6f 6f 0d 0d 0a                       |foo...|
00000006

Note the duplicate 0d byte.

Simn commented 5 years ago

I think it must actually be Sys.print causing that.

Aurel300 commented 5 years ago

Whoops, you're right.

Simn commented 5 years ago

@Aurel300 Can you handle this or does someone else need to do something?

hughsando commented 5 years ago

I'm not sure there is a problem here. For starters, the slash-r in the example is a bit confusing - is it needed to illustrate the issue? On windows if you do: printf("foo\n"); you get 5 characters. Crap as this might be, this is standard behaviour - ie text mode by default. And if it not supposed to be text mode, why put all the slash-rs back in? The should just be slash-n.

Simn commented 5 years ago

I don't find turning "\r\n" into "\r\r\n" when printing an acceptable default behavior.

Aurel300 commented 5 years ago

@simn I think Hugh is suggesting just going with \n on Windows as well.

As much as I hate it, CRLF is a convention on Windows platforms and Sys.println should output a CRLF-style linebreak when outputting to stdout. The problem is in situations like in the OP, where the input could come from another program (that is how I spotted it in the Unicode tests in the first place). Most importantly, if we kept text mode, just "piping" binary data would not work as expected, since it would clobber any 0x0A character.

Oh, and of course – the fixed behaviour is already the case on the majority of sys platforms.

hughsando commented 5 years ago

Well, it is really only, and always, turning "\n" into "\r\n". This is the whole difference between fopen(name,"r") and fopen(name,"rb") - 50 years of legacy catching up with us. We most certainly do not want to explicitly add the "\r", because this does not allow is to ever get rid of them. I think the solution you are after is to set the output mode to binary (on request).

Aurel300 commented 5 years ago

Ok, after some discussion with @hughsando:

Current system

Currently, stdout and stderr are open in text mode on Windows. This makes Sys.println("a\nb"); work "as expected", i.e. a<CR><LF>b is output. However, Sys.println("a\r\nb"); outputs a<CR><CR><LF>b, which is problematic when the string comes from a subprocess.

PR HaxeFoundation/hxcpp#820

In the PR I switched stdout and stderr to binary mode, then made Sys.println append "\r\n" on Windows. This fixes the issue with subprocesses, but breaks Sys.println("a\nb");.


We could solve this instead by adding Sys.setBinaryMode(binary:Bool); to stdlib. It is a bit annoying, since this will only have any effect on Windows, neither of the approaches above is good enough; \n inside Sys.println calls is presumably a common construct, which we don't want to break. On Windows, we will default to text mode to maintain backwards compatibility. Later we could think about auto-detecting stdout and setting the mode to something appropriate.

RealyUniqueName commented 5 years ago

I'd say it doesn't break Sys.println("a\nb"); but fixes it. E.g. I always explicitly type \r\n when I want windows-style line endings.