Open need47 opened 6 years ago
A file can be closed after any amount of its contents are read, so its safe to do so after readline only reads one line.
A subprocess possibly needs all its output read before it can terminate cleanly, which readlines does, but readline only reads part of the output it can leave the subprocess hung waiting for someone to read the rest of its output, but nobody is ever going to. Therefore its unsafe to have a readline for subprocesses.
Thanks for the explanation. Probably the error messages need to be improved. Please feel free to close the ticket.
Arguably in this situation we could automatically arrange to ignore the SIGPIPE error from the command since it seems clear that’s what the user wants. Since this either could use a better error or a change to make it work, I’ll leave it open.
Since this either could use a better error
The docs for readline
shows its usage and like for everything else, calling a method with some other types gives a MethodError
. So you mean that you want to implement readline(::Cmd)
and have it show some other error message? Doesn't this open a whole can of specialized error messages that should be implemented e.g.
[1] + 2
"foo" + "bar"
etc?
No, I’d actually prefer to make it work as expected as if you’d called head -n1 on the command.
readline(cmd::Cmd) = readlines(cmd)[1]
:grin:
@StefanKarpinski readline(::IOStream)
does not close the file, only readline(::AbstractString)
does since it opens the file itself. These two are distinguished by type.
Since AbstractString is already taken, there is no way of telling readline()
to create a Cmd that it owns and can close.
If readline(::Cmd)
is to work like head -n1 then it would work like the AbstractString version, so its inconsistent with the IOStream version.
But a user with a Cmd bound to a variable trying to read lines one at a time from it (perhaps doing interaction with the subprocess) would be surprised that their Cmd is unexpectedly closed.
Cmd
objects aren't open or closed, they just represent a command that could be run.
Thats correct, at the moment, the only way to get the output of a Cmd
is readlines(::Cmd)
which presumably runs the command, buffers all the data from the pipes, and wait()
s for the process to complete before it returns the array of data. At least I assume so, readlines(::Cmd)
does not appear to be documented.
But my point is that the behaviour of readline(::Cmd)
means that it does not return all data at one time, so what is the semantics in this case? Does it do my suggested readlines(cmd)[1]
, in which case, yes, there is no "open" concept, or does it follow the semantics of readline(::IOStream)
where successive calls return successive lines? In which case the Cmd
is conceptually "open".
Yes, I figured out readlines() by myself. Only read() is given as an example in the documentation. Then I tried readline() for whatever reason and noticed the difference between readline() and readlines().
On Mon, Sep 10, 2018 at 7:00 PM elextr notifications@github.com wrote:
Thats correct, at the moment, the only way to get the output of a Cmd is readlines(::Cmd) which presumably runs the command, buffers all the data from the pipes, and wait()s for the process to complete before it returns the array of data. At least I assume so, readlines(::Cmd) does not appear to be documented.
But my point is that the behaviour of readline(::Cmd) means that it does not return all data at one time, so what is the semantics in this case? Does it do my suggested readlines(cmd)[1], in which case, yes, there is no "open" concept, or does it follow the semantics of readline(::IOStream) where successive calls return successive lines? In which case the Cmd is conceptually "open".
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/JuliaLang/julia/issues/28975#issuecomment-420064097, or mute the thread https://github.com/notifications/unsubscribe-auth/ABI4tTeyuYSIgp9DWZJeahgDmzbVFYmEks5uZtbfgaJpZM4WTzNU .
-- -- Tiejun 提问之前请先查阅《CN-Rockville生活手册》相关经验分享——http://goo.gl/WDl5H
or does it follow the semantics of
readline(::IOStream)
This doesn't make sense: you can't read a sequence of lines from a Cmd
object—you have to open it first. The correct analogy of Cmd
is that it is like a file name of type String
. You can do readline(path::String)
and it will open path
, call readline
on the open file handle, and then close the file handle. readline(cmd::Cmd)
would behave in an analogous manner: run cmd
, call readline
on the open process object, and then close the process. However, many commands will fail with a SIGPIPE if you do that and the question is whether we should ignore the error caused by that.
This doesn't make sense: you can't read a sequence of lines from a Cmd object
Just because it isn't implemented that way at the moment doesn't mean it "doesn't make sense". The Cmd
object can just as easily maintain the buffer of data as it is received, and readline(::Cmd)
read from that buffer. This allows the Julia process to proceed to process lines in parallel with the subprocess instead of having to wait for the subprocess to complete, as is necessary with readlines(::Cmd)
.
The correct analogy of Cmd is that it is like a file name of type String.
But its an analogy with problems, see below.
However, many commands will fail with a SIGPIPE if you do that and the question is whether we should ignore the error caused by that.
Crashing processes is bad karma, thats how you get locked resources and other "bad things".
As my initial explanation said, this is where the analogy between a file and a subprocess output breaks down. Its safe to close a file after reading any amount, its not necessarily so for a subprocess.
Anyway the SIGPIPE happens in the subprocess, if its blocked on write to pipe when the pipe is closed. It doesn't happen in the Julia process, so presumably its only communicated by the wait()
so is it proposed that the readline(::Cmd)
wait for the subprocess, or is it waited for in a separate task and so the error is recognised asynchronously? If its asynchronous then the error really should be ignored, unexpected asynchronous exceptions are hard to diagnose.
The
Cmd
object can just as easily maintain the buffer of data as it is received
Again: a Cmd
is not a running process so this doesn't make sense. Observe:
julia> cmd = `date +%T`
`date +%T`
julia> readchomp(cmd)
"22:00:36"
julia> readchomp(cmd)
"22:00:37"
julia> readchomp(cmd)
"22:00:38"
One Cmd
object, three different processes with three different outputs. It does not make sense for a Cmd
object to have a buffer because there is no process to read the output of unless you spawn the command and create a process.
Crashing processes is bad karma, thats how you get locked resources and other "bad things".
This is completely standard in UNIX pipelines and does not cause "bad things". When a later process in the pipelines is done reading, it closes it's input, which causes the earlier processes to receive a SIGPIPE signal which usually causes it to terminate.
This might help: open
on a String
returns an IOStream
, open
on a Cmd
returns a Process
. A Cmd
is stateless; the Process
tracks resources related to the resulting running process.
That makes total sense and makes Cmd
/Process
analogous to String
/IOStream
. So then readline(::Cmd)
would act as @StefanKarpinski says and readline(::Process)
would return successive lines.
I had totally overlooked the existence of the Process
type since the only place it appears in the documentation is in Base.kill
and some related functions, run(::Cmd)
and open(::Cmd)
should refer to it by link (instead of lower case "process").
Looking at the code, that suggestion seems to be close to the current behaviour although open(::Cmd)
seems to return something called a ProcessChain
which I presume is similar but can be a series of Process
es pipelined. Doesn't seem to be a readline()
on that though. Note that open(::Cmd)
is still documented to return a Tuple (stream, process)
.
The documentation was fixed by #25563.
although
open(::Cmd)
seems to return something called aProcessChain
which I presume is similar but can be a series ofProcess
es pipelined
It depends on whether you open a process or a pipeline:
julia> open(`true`)
Process(`true`, ProcessExited(0))
julia> open(pipeline(`echo Hello`, `cat`))
Base.ProcessChain(Base.Process[Process(`echo Hello`, ProcessExited(0)), Process(`cat`, ProcessExited(0))], Base.DevNull(), Pipe(RawFD(0xffffffff) closed => RawFD(0x00000016) open, 0 bytes waiting), Base.TTY(RawFD(0x00000011) open, 0 bytes waiting))
julia version: 1.0.0