dspinellis / dgsh

Shell supporting pipelines to and from multiple processes
http://www.spinellis.gr/sw/dgsh/
Other
323 stars 22 forks source link

Avoid nout=0 on last pipe component #90

Open lucaswerkmeister opened 7 years ago

lucaswerkmeister commented 7 years ago

This example is somewhat constructed, but:

{{ cat & }} | cat

If you run that command and type something into standard input, nothing will appear on standard output. I think this is because dgsh negotiates zero output file descriptors (cat declares n_output_fds=-1), which isn’t very useful for the last component of a pipe.

I’ve also noticed this for my own ja2l program: if you run something like cat /proc/mounts | ./build/libexec/dgsh/ja2l, my program prints:

negotiation resulted in zero outputs, which does not make sense (try piping to cat)

(I added explicit error detection for this case, because otherwise my program crashes – it needs 1—N output fds, there’s just no mechanism to communicate this.)

The problem with this bug report is that this pipeline does print something:

cat /proc/mounts | {{ cat & }} | cat

and so does this one:

cat some_json_file | ./build/libexec/dgsh/ja2l | cat

even though DGSH_DEBUG_LEVEL=3 output in this case also indicates nout=0 for one component (which I would assume to be the final cat). So I might be misunderstanding something here…

TL;DR: if the last pipe component declares nout=-1, perhaps negotiate nout=1 (stdout) instead of nout=0?

mfragkoulis commented 7 years ago

Both {{ cat & }} | cat and cat /proc/mounts | {{ cat & }} | cat run fine on Debian Jessie. I just added a test for this. Hopefully, the Travis build on Ubuntu Linux will verify this.

Again, can you please run the first with DGSH_DEBUG_LEVEL=4 and share the log output?

lucaswerkmeister commented 7 years ago

https://gist.github.com/lucaswerkmeister/50cfb49a5be6ddfa1172172048001147 (DGSH_DEBUG_LEVEL=4 dgsh -c '{{ cat & }} | cat' 2>&1 | xclip; I killed the inner cat with htop because Ctrl+D didn’t have any effect and Ctrl+C killed the entire outer pipeline including xclip)

mfragkoulis commented 7 years ago

Hm, this is a bug. My previous successful test was slightly different: {{ cat /dev/null & }} | cat

The bug happens because the {{ cat & }} | cat topology (and any dgsh graph starting with a multipipe block) requires a dgsh-conc instance to start and coordinate the negotiation process. dgsh connects the dgsh-conc instance with the first cat command with a socket pair by dupping the socket pair's input fd to cat's stdin and its output fd to dgsh-conc's stdout.

After the negotiation is over, dgsh-conc terminates, but cat's stdin is not connected to the shell's terminal in order to receive input.

Pending a permanent a work around one can prepend a tee/cat command: cat | {{ cat & }} | cat

dspinellis commented 5 years ago

Suggestion: concentrators whose input or output is a tty should exec cat if they have a single output or input sink or source, otherwise error, asking for a command such as cut or tee to be used at the end or the beginning of a multipipe block.

dspinellis commented 5 years ago

Better: if the input-side concentrator sees that only a single process on its output side requires input, then it will send its input file descriptor to that process before exiting.

mfragkoulis commented 5 years ago

There is a problem. Currently we can't know for sure which processes can take input or not.

For instance, take this imaginary command graph {{ cat & find & }} | cat.

The cat command in the multipipe block enters the negotiation with 0 input channel requirements because the DGSH_IN environment variable is set to 0 by the dgsh shell. DGSH_IN=0 means that there is no pipe connection to cat's standard input. However, cat can take input from its standard input in other ways; e.g. from the terminal.

find, on the other hand, is one of the commands that will never take input from their standard input. find will also have DGSH_IN=0 and will also request 0 input channels in the negotiation process as will cat.

Currently there is no distinction between the two cases and, therefore, we can't, I think, provide a solution to this issue without clarifying the input channel availability semantics.