Open hcarty opened 8 years ago
Isn't the problem here that you are using Pervasives.print_string
in test.ml ? I guess this is what does the translation.
I just checked and the OS.Cmd
combinators only use Unix.openfile
(which AFAIK is oblivious of the distinction between text and binary) and OS.File
only uses open_in_bin
.
In fact you are using OS.File.read
with OS.File.dash
which in this case will not use open_in_bin
but simply use Pervasives.stdin
. So I guess this is the actual source of the problem.
I could use Pervasives.set_binary_mode_in
on Pervasives.stdin
before doing the read but there's no way to query the status to set it back to its previous state. I guess these are the only things that can be done:
set_binary_mode_{in,out}
themselves.set_binary_mode_{in,out}
on stdin
and stdout
in OS.File
's IO functions and document the side effect performed on the channel.What would Windows users expect ?
I kind of think that these automatic translations are more harmful than beneficial so I'm tempted by 2. With 1. it basically means that non Windows user have to think about this and they will not.
/cc @fdopen @dra27
The source of this problem is print_string
- all the calls in main.ml use binary Unix.file_descr
so test.ml should only receive \n
on stdin, therefore the fact that stdin is in text mode won't matter. However, since stdout is in text mode in test.ml, print_string
will translate \n
→ \r\n
which main.ml reads back without translation. However, if main.ml actually had "Hello world\r\n\x00\x00booooo"
then this example would work, but for scary and unstable reasons! (and weird examples like "Hello world\r\n\n\x00\x00booooo"
would not work)
I think the expectation would be that stdhandles are treated in the same way as files, certainly. I don't have a reasoned opinion on which way it should be, though, I'm afraid! There is already a disparity between how Unix
and Pervasives
do this, for example:
Unix.(write stdout "\n" 0 1) |> ignore;
Pervasives.(output stdout "\n" 0 1) |> ignore
writes "\n\r\n"
to stdout on Windows.
I expect that the rationale for this may be that the standard library behaves like the underlying C runtime (where \n
really should mean "do whatever is necessary to make a line-end" and in Microsoft C by default does translations) vs the Unix library which aims to behave the same way on all platforms as far as is reasonably possible.
You can detect whether a channel is in binary mode via an exposed runtime function, but you have to use a slightly unscrupulous, if entirely safe, C stub to get at it:
#include <caml/memory.h>
CAMLextern int caml_channel_binary_mode (void *);
CAMLprim value caml_ml_channel_binary_mode(value vchannel)
{
CAMLparam1(vchannel);
CAMLreturn(Val_bool(caml_channel_binary_mode(*((void **) (Data_custom_val(vchannel))))));
}
which can then be used with
external get_binary_mode_in : in_channel -> bool = "caml_ml_binary_mode"
external get_binary_mode_out : out_channel -> bool = "caml_ml_binary_mode"
It's a surprising omission from Pervasives...
Under Windows, using the following programs the
\n
indata
is converted to\r\n
when the two programs are compiled and run asmain.exe test.exe
:Under something more Unix-y there is no issue, I'm guessing because there is no difference there between "text" and "binary" IO.
main.ml
test.ml