dbuenzli / bos

Basic OS interaction for OCaml
http://erratique.ch/software/bos
ISC License
63 stars 16 forks source link

Bos.OS.File.read on stdin #78

Closed rizo closed 5 years ago

rizo commented 6 years ago

I was surprised that Bos.OS.File.read (Fpath.v "/dev/stdin") is an empty string while Bos.OS.File.read (Fpath.v "-") actually waits for input. Is that intentional?

dbuenzli commented 6 years ago

I don't know. There's no special casing for /dev/stdin it treats it like a file, I don't know how this file behaves on your system.

- is special cased and reads directly from the OCaml stdin in_channel or Unix.stdin because that's the widespread unix convention to denote stdout and stdin where files are expected in Unix tools. See here for details.

dbuenzli commented 6 years ago

(Note that using /dev/stdout for writing rather than - will likely fail because of bos's atomic writes, see #70).

dbuenzli commented 6 years ago

Ah I think I know why this is the case. Files are read in one read and for that in_channel_length is used. I guess this returns 0 on /dev/stdin.

rizo commented 6 years ago

I'm aware of the - convention and I think it's good that Bos follows it.

I imagine people would be equally likely to refer to /dev/stdin directly. In fact I would expect it to support other kinds of channels that don't have a fixed length (like named pipes).

I submitted a PR to add support for S_CHR (e.g. /dev/stdin) and S_FIFO (i.e. named pipes) devices.

See https://github.com/ocaml/ocaml/blob/trunk/otherlibs/unix/unix.mli#L415

dbuenzli commented 6 years ago

I wonder if the best here is not to simply try read the file in the "usual" way whenever in_channel_length reports 0.

rizo commented 5 years ago

That would work I think.

The only edge case is when a file is actually empty – in_channel_length reports 0 in that case too. The later call to input would produce 0, and thus an empty string, as expected. It would unnecessarily allocating extra IO_BUFFER_SIZE, which is not a big deal.

rizo commented 5 years ago

Following your idea seems like we can indeed separate the read operation for regular files and the streaming ones. The following checks could be performed to make the decision:

dbuenzli commented 5 years ago

I don't mind about the case for empty files, in practice simply making the test for zero should be enough.

rizo commented 5 years ago

I'll change the PR then :)

rizo commented 5 years ago

Unfortunately in_channel_length only works on character devices but not on named pipes:

$ mkfifo pipe; echo "hi" > pipe

# let ic = open_in "pipe";;
val ic : in_channel = <abstr>
# in_channel_length ic;;
Exception: Sys_error "Illegal seek".
dbuenzli commented 5 years ago

Humpf. Then I guess it's either reimplementing in_channel_length using Unix.lseek and catching ESPIPE or stating. I would favor the former.

dbuenzli commented 5 years ago

The code here basically does the lseek stuff but does not handle ESPIPE.

dbuenzli commented 5 years ago

Incidentally I had to go over this. I think something along this way (but on channels should do):

    let to_string fd =
      let b = Bytes.create unix_buffer_size in
      let acc = Buffer.create unix_buffer_size in
      let rec loop () = match Unix.read fd b 0 (Bytes.length b) with
      | 0 -> Buffer.contents acc
      | l -> Buffer.add_subbytes acc b 0 l; loop ()
      | exception Unix.Unix_error (Unix.EINTR, _, _) -> loop ()
      in
      loop ()

    let rec really_read fd b start len = match len <= 0 with
    | true -> ()
    | false ->
        match Unix.read fd b start len with
        | 0 -> failwith (err_doing "Reading" "Unexpected end of file")
        | r -> really_read fd b (start + r) (len - r)
        | exception Unix.Unix_error (Unix.EINTR, _, _) ->
            really_read fd b start len

    let read_file file fd =
      try
        match Unix.lseek fd 0 Unix.SEEK_END with
        | exception Unix.Unix_error (Unix.ESPIPE, _, _) -> Ok (to_string fd)
        | len when len > Sys.max_string_length ->
            failwithf "File to read too large: %d bytes, max supported: %d"
              len Sys.max_string_length
        | len ->
            let b = Bytes.create len in
            ignore (Unix.lseek fd 0 Unix.SEEK_SET);
            really_read fd b 0 len;
            Ok (Bytes.unsafe_to_string b)
      with
      | Failure e -> Result.errorf "%a: %s" Fpath.pp file e
      | Unix.Unix_error (e, _, _) ->
          Result.errorf "%a: %s" Fpath.pp file (err_doing "Reading" (uerr e))
rizo commented 5 years ago

I think catching ESPIPE would work. Sorry for the delay, I'll try to find time to try this soon.

B0 looks like an interesting project, by the way.

dbuenzli commented 5 years ago

Sorry for the delay, I'll try to find time to try this soon.

Well no need to be sorry I'm the one who's having the bug here...

rizo commented 5 years ago

It's about communicating the expectation that I haven't given up on this :-P

rizo commented 5 years ago

Did you have a chance to look at https://github.com/dbuenzli/bos/pull/79? Any suggestions/concerns?

dbuenzli commented 5 years ago

Thanks. Fixed by @rizo in 61526bf67f218fd730db8ac19.