JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.67k stars 5.48k forks source link

readline not working for `ls`, but readlines work #28975

Open need47 opened 6 years ago

need47 commented 6 years ago

julia version: 1.0.0

julia> readline(`ls`)
ERROR: MethodError: no method matching readline(::Cmd)
Closest candidates are:
  readline(::IOStream; keep) at iostream.jl:429
  readline() at io.jl:370
  readline(::AbstractString; keep) at io.jl:364
  ...
Stacktrace:
 [1] top-level scope at none:0
julia> readlines(`ls`)   # readlines work however. I don't see too much difference between the two.
27-element Array{String,1}:       
 "cpp"                             
 "cronjobs"                        
 ...                  
julia> readline("list")   # can read from file
"1468"
elextr commented 6 years ago

A file can be closed after any amount of its contents are read, so its safe to do so after readline only reads one line.

A subprocess possibly needs all its output read before it can terminate cleanly, which readlines does, but readline only reads part of the output it can leave the subprocess hung waiting for someone to read the rest of its output, but nobody is ever going to. Therefore its unsafe to have a readline for subprocesses.

need47 commented 6 years ago

Thanks for the explanation. Probably the error messages need to be improved. Please feel free to close the ticket.

StefanKarpinski commented 6 years ago

Arguably in this situation we could automatically arrange to ignore the SIGPIPE error from the command since it seems clear that’s what the user wants. Since this either could use a better error or a change to make it work, I’ll leave it open.

KristofferC commented 6 years ago

Since this either could use a better error

The docs for readline shows its usage and like for everything else, calling a method with some other types gives a MethodError. So you mean that you want to implement readline(::Cmd) and have it show some other error message? Doesn't this open a whole can of specialized error messages that should be implemented e.g.

[1] + 2
"foo" + "bar"

etc?

StefanKarpinski commented 6 years ago

No, I’d actually prefer to make it work as expected as if you’d called head -n1 on the command.

elextr commented 6 years ago

readline(cmd::Cmd) = readlines(cmd)[1] :grin:

elextr commented 6 years ago

@StefanKarpinski readline(::IOStream) does not close the file, only readline(::AbstractString) does since it opens the file itself. These two are distinguished by type.

Since AbstractString is already taken, there is no way of telling readline() to create a Cmd that it owns and can close.

If readline(::Cmd) is to work like head -n1 then it would work like the AbstractString version, so its inconsistent with the IOStream version.

But a user with a Cmd bound to a variable trying to read lines one at a time from it (perhaps doing interaction with the subprocess) would be surprised that their Cmd is unexpectedly closed.

StefanKarpinski commented 6 years ago

Cmd objects aren't open or closed, they just represent a command that could be run.

elextr commented 6 years ago

Thats correct, at the moment, the only way to get the output of a Cmd is readlines(::Cmd) which presumably runs the command, buffers all the data from the pipes, and wait()s for the process to complete before it returns the array of data. At least I assume so, readlines(::Cmd) does not appear to be documented.

But my point is that the behaviour of readline(::Cmd) means that it does not return all data at one time, so what is the semantics in this case? Does it do my suggested readlines(cmd)[1], in which case, yes, there is no "open" concept, or does it follow the semantics of readline(::IOStream) where successive calls return successive lines? In which case the Cmd is conceptually "open".

need47 commented 6 years ago

Yes, I figured out readlines() by myself. Only read() is given as an example in the documentation. Then I tried readline() for whatever reason and noticed the difference between readline() and readlines().

On Mon, Sep 10, 2018 at 7:00 PM elextr notifications@github.com wrote:

Thats correct, at the moment, the only way to get the output of a Cmd is readlines(::Cmd) which presumably runs the command, buffers all the data from the pipes, and wait()s for the process to complete before it returns the array of data. At least I assume so, readlines(::Cmd) does not appear to be documented.

But my point is that the behaviour of readline(::Cmd) means that it does not return all data at one time, so what is the semantics in this case? Does it do my suggested readlines(cmd)[1], in which case, yes, there is no "open" concept, or does it follow the semantics of readline(::IOStream) where successive calls return successive lines? In which case the Cmd is conceptually "open".

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/JuliaLang/julia/issues/28975#issuecomment-420064097, or mute the thread https://github.com/notifications/unsubscribe-auth/ABI4tTeyuYSIgp9DWZJeahgDmzbVFYmEks5uZtbfgaJpZM4WTzNU .

-- -- Tiejun 提问之前请先查阅《CN-Rockville生活手册》相关经验分享——http://goo.gl/WDl5H

StefanKarpinski commented 6 years ago

or does it follow the semantics of readline(::IOStream)

This doesn't make sense: you can't read a sequence of lines from a Cmd object—you have to open it first. The correct analogy of Cmd is that it is like a file name of type String. You can do readline(path::String) and it will open path, call readline on the open file handle, and then close the file handle. readline(cmd::Cmd) would behave in an analogous manner: run cmd, call readline on the open process object, and then close the process. However, many commands will fail with a SIGPIPE if you do that and the question is whether we should ignore the error caused by that.

elextr commented 6 years ago

This doesn't make sense: you can't read a sequence of lines from a Cmd object

Just because it isn't implemented that way at the moment doesn't mean it "doesn't make sense". The Cmd object can just as easily maintain the buffer of data as it is received, and readline(::Cmd) read from that buffer. This allows the Julia process to proceed to process lines in parallel with the subprocess instead of having to wait for the subprocess to complete, as is necessary with readlines(::Cmd).

The correct analogy of Cmd is that it is like a file name of type String.

But its an analogy with problems, see below.

However, many commands will fail with a SIGPIPE if you do that and the question is whether we should ignore the error caused by that.

Crashing processes is bad karma, thats how you get locked resources and other "bad things".

As my initial explanation said, this is where the analogy between a file and a subprocess output breaks down. Its safe to close a file after reading any amount, its not necessarily so for a subprocess.

Anyway the SIGPIPE happens in the subprocess, if its blocked on write to pipe when the pipe is closed. It doesn't happen in the Julia process, so presumably its only communicated by the wait() so is it proposed that the readline(::Cmd) wait for the subprocess, or is it waited for in a separate task and so the error is recognised asynchronously? If its asynchronous then the error really should be ignored, unexpected asynchronous exceptions are hard to diagnose.

StefanKarpinski commented 6 years ago

The Cmd object can just as easily maintain the buffer of data as it is received

Again: a Cmd is not a running process so this doesn't make sense. Observe:

julia> cmd = `date +%T`
`date +%T`

julia> readchomp(cmd)
"22:00:36"

julia> readchomp(cmd)
"22:00:37"

julia> readchomp(cmd)
"22:00:38"

One Cmd object, three different processes with three different outputs. It does not make sense for a Cmd object to have a buffer because there is no process to read the output of unless you spawn the command and create a process.

Crashing processes is bad karma, thats how you get locked resources and other "bad things".

This is completely standard in UNIX pipelines and does not cause "bad things". When a later process in the pipelines is done reading, it closes it's input, which causes the earlier processes to receive a SIGPIPE signal which usually causes it to terminate.

JeffBezanson commented 6 years ago

This might help: open on a String returns an IOStream, open on a Cmd returns a Process. A Cmd is stateless; the Process tracks resources related to the resulting running process.

elextr commented 6 years ago

That makes total sense and makes Cmd/Process analogous to String/IOStream. So then readline(::Cmd) would act as @StefanKarpinski says and readline(::Process) would return successive lines.

I had totally overlooked the existence of the Process type since the only place it appears in the documentation is in Base.kill and some related functions, run(::Cmd) and open(::Cmd) should refer to it by link (instead of lower case "process").

Looking at the code, that suggestion seems to be close to the current behaviour although open(::Cmd) seems to return something called a ProcessChain which I presume is similar but can be a series of Processes pipelined. Doesn't seem to be a readline() on that though. Note that open(::Cmd) is still documented to return a Tuple (stream, process).

JeffBezanson commented 6 years ago

The documentation was fixed by #25563.

StefanKarpinski commented 6 years ago

although open(::Cmd) seems to return something called a ProcessChain which I presume is similar but can be a series of Processes pipelined

It depends on whether you open a process or a pipeline:

julia> open(`true`)
Process(`true`, ProcessExited(0))

julia> open(pipeline(`echo Hello`, `cat`))
Base.ProcessChain(Base.Process[Process(`echo Hello`, ProcessExited(0)), Process(`cat`, ProcessExited(0))], Base.DevNull(), Pipe(RawFD(0xffffffff) closed => RawFD(0x00000016) open, 0 bytes waiting), Base.TTY(RawFD(0x00000011) open, 0 bytes waiting))