dspinellis / dgsh

Shell supporting pipelines to and from multiple processes
http://www.spinellis.gr/sw/dgsh/
Other
323 stars 22 forks source link

Problem with grep (maybe) #99

Closed trantor closed 6 years ago

trantor commented 6 years ago

Hello again.

Trying to come up with a script to process a few log files, but I am having problems with grep. I am not sure whether it's a wrapping problem or what.

See this gist Latest dgsh compiled from scratch on Ubuntu 16.04.

mfragkoulis commented 6 years ago

Hi @trantor,

Can you run your script again as

DGSH_DEBUG_LEVEL=3 dgsh -x match.dgsh messages@mailagenziaentrate40.top logfile

and provide the output?

trantor commented 6 years ago

Here it is

$ DGSH_DEBUG_LEVEL=3 dgsh -x match.dgsh messages@mailagenziaentrate40.top logfile 
+ MITTENTE=messages@mailagenziaentrate40.top
+ shift
6232: dgsh command

+ dgsh-wrap xzcat -f logfile
+ grep -F --matching-lines --file=-
+ grep -oP '(?<=: to=<)[^>]+'
+ dgsh-conc -o 2
6233: After getopt: ninputs=1, noutputs=1 optind=1 argv[optind]=xzcat
6233: program_supplied=0
6233: calling negotiate with ninputs=1 noutputs=1
6233: dgsh_negotiate(): Tool xzcat with pid 6233 negotiating: nin=1 nout=1.
6233: Message block created by process xzcat with pid 6233.

6233: add_node(): Added node xzcat in position 0 on dgsh graph, initiator: 6233
6233: write_message_block(): xzcat (0)
+ dgsh-conc -i 2
+ grep -E 'Subject:.*from=<messages@mailagenziaentrate40.top'
+ grep -oP '[[:xdigit:]]+(?=: info: header Subject:)'
+ wait
+ sort -u
6236: dgsh_negotiate(): Tool grep -oP (?<=: to=<)[^>]+ with pid 6236 negotiating: nin=-1 nout=1.
6235: dgsh_negotiate(): Tool grep -F --matching-lines with pid 6235 negotiating: nin=-1 nout=1.
6237: read_message_block(): (null) (0)
6237: write_message_block(): (null) (0)
+ cat
6240: dgsh_negotiate(): Tool grep -oP [[:xdigit:]]+(?=: info: header Subject:) with pid 6240 negotiating: nin=-1 nout=1.
6241: dgsh_negotiate(): Tool sort -u with pid 6241 negotiating: nin=-1 nout=1.
6239: dgsh_negotiate(): Tool grep -E Subject:.*from=<messages@mailagenziaentrate40.top with pid 6239 negotiating: nin=-1 nout=-1.
6239: read_message_block(): grep -E Subject:.*from=<messages@mailagenziaentrate40.top (0)
6239: add_node(): Added node grep -E Subject:.*from=<messages@mailagenziaentrate40.top in position 1 on dgsh graph, initiator: 6233
6239: write_message_block(): grep -E Subject:.*from=<messages@mailagenziaentrate40.top (1)
6240: read_message_block(): grep -oP [[:xdigit:]]+(?=: info: header Subject:) (0)
6240: add_node(): Added node grep -oP [[:xdigit:]]+(?=: info: header Subject:) in position 2 on dgsh graph, initiator: 6233
6240: write_message_block(): grep -oP [[:xdigit:]]+(?=: info: header Subject:) (2)
6241: read_message_block(): sort -u (0)
6241: add_node(): Added node sort -u in position 3 on dgsh graph, initiator: 6233
6241: write_message_block(): sort -u (3)
6238: read_message_block(): (null) (0)
6238: write_message_block(): (null) (0)
6235: read_message_block(): grep -F --matching-lines (0)
6235: add_node(): Added node grep -F --matching-lines in position 4 on dgsh graph, initiator: 6233
6235: write_message_block(): grep -F --matching-lines (4)
6236: read_message_block(): grep -oP (?<=: to=<)[^>]+ (0)
6236: add_node(): Added node grep -oP (?<=: to=<)[^>]+ in position 5 on dgsh graph, initiator: 6233
6236: write_message_block(): grep -oP (?<=: to=<)[^>]+ (5)
6235: read_message_block(): grep -F --matching-lines (4)
6235: write_message_block(): grep -F --matching-lines (4)
6238: read_message_block(): (null) (0)
6238: write_message_block(): (null) (0)
6241: read_message_block(): sort -u (3)
6241: write_message_block(): sort -u (3)
6240: read_message_block(): grep -oP [[:xdigit:]]+(?=: info: header Subject:) (2)
6240: write_message_block(): grep -oP [[:xdigit:]]+(?=: info: header Subject:) (2)
6239: read_message_block(): grep -E Subject:.*from=<messages@mailagenziaentrate40.top (1)
6239: write_message_block(): grep -E Subject:.*from=<messages@mailagenziaentrate40.top (1)
6237: read_message_block(): (null) (0)
6237: write_message_block(): (null) (0)
6242: dgsh_negotiate(): Tool dgsh-tee with pid 6242 negotiating: nin=-1 nout=-1.
6242: read_message_block(): dgsh-tee (0)
6242: add_node(): Added node dgsh-tee in position 6 on dgsh graph, initiator: 6233
6242: write_message_block(): dgsh-tee (6)
6238: read_message_block(): (null) (0)
6238: write_message_block(): (null) (0)
6242: read_message_block(): dgsh-tee (6)
6242: write_message_block(): dgsh-tee (6)
6237: read_message_block(): (null) (0)
6237: write_message_block(): (null) (0)
6233: read_message_block(): xzcat (0)
6233: dgsh_negotiate(): Gathered I/O requirements.
ERROR: More than one edges are flexible. Cannot compute solution. Exiting.
dgsh: No solution was found to satisfy the I/O requirements of the following 0 participating processes:
*** Error in `dgsh-wrap': free(): invalid pointer: 0x00007ffe12a5f234 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fa0bfddd7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fa0bfde637a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fa0bfdea53c]
dgsh-wrap[0x4058e2]
dgsh-wrap[0x406208]
dgsh-wrap[0x40ac5f]
dgsh-wrap[0x402483]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fa0bfd86830]
dgsh-wrap[0x401439]
======= Memory map: ========
00400000-0040f000 r-xp 00000000 08:11 8535587                            /usr/local/libexec/dgsh/dgsh-wrap
0060f000-00610000 r-xp 0000f000 08:11 8535587                            /usr/local/libexec/dgsh/dgsh-wrap
00610000-00611000 rwxp 00010000 08:11 8535587                            /usr/local/libexec/dgsh/dgsh-wrap
00fbb000-00fdc000 rwxp 00000000 00:00 0                                  [heap]
7fa0b8000000-7fa0b8021000 rwxp 00000000 00:00 0 
7fa0b8021000-7fa0bc000000 ---p 00000000 00:00 0 
7fa0bfb50000-7fa0bfb66000 r-xp 00000000 08:11 6817824                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7fa0bfb66000-7fa0bfd65000 ---p 00016000 08:11 6817824                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7fa0bfd65000-7fa0bfd66000 rwxp 00015000 08:11 6817824                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7fa0bfd66000-7fa0bff26000 r-xp 00000000 08:11 6828166                    /lib/x86_64-linux-gnu/libc-2.23.so
7fa0bff26000-7fa0c0126000 ---p 001c0000 08:11 6828166                    /lib/x86_64-linux-gnu/libc-2.23.so
7fa0c0126000-7fa0c012a000 r-xp 001c0000 08:11 6828166                    /lib/x86_64-linux-gnu/libc-2.23.so
7fa0c012a000-7fa0c012c000 rwxp 001c4000 08:11 6828166                    /lib/x86_64-linux-gnu/libc-2.23.so
7fa0c012c000-7fa0c0130000 rwxp 00000000 00:00 0 
7fa0c0130000-7fa0c0156000 r-xp 00000000 08:11 6815831                    /lib/x86_64-linux-gnu/ld-2.23.so
7fa0c0320000-7fa0c0323000 rwxp 00000000 00:00 0 
7fa0c0352000-7fa0c0355000 rwxp 00000000 00:00 0 
7fa0c0355000-7fa0c0356000 r-xp 00025000 08:11 6815831                    /lib/x86_64-linux-gnu/ld-2.23.so
7fa0c0356000-7fa0c0357000 rwxp 00026000 08:11 6815831                    /lib/x86_64-linux-gnu/ld-2.23.so
7fa0c0357000-7fa0c0358000 rwxp 00000000 00:00 0 
7ffe12a41000-7ffe12a62000 rwxp 00000000 00:00 0                          [stack]
7ffe12bd8000-7ffe12bda000 r--p 00000000 00:00 0                          [vvar]
7ffe12bda000-7ffe12bdc000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
6235 dgsh: timeout for negotiation. Exit.
6238 dgsh: timeout for negotiation. Exit.
6237 dgsh: timeout for negotiation. Exit.
6236 dgsh: timeout for negotiation. Exit.
6240 dgsh: timeout for negotiation. Exit.
6241 dgsh: timeout for negotiation. Exit.
6239 dgsh: timeout for negotiation. Exit.
6242 dgsh: timeout for negotiation. Exit.
dspinellis commented 6 years ago

@trantor Interesting use case! I think the problem is the call of cat and sort inside the first block. Both are flexible in their inputs (they can act as dgsh sinks), but none of the two is actually required to work like that. So I recommend replacing them with /bin/sort and /bin/cat.

@mfragkoulis We need to implement a better way of reporting for these problems.

Both: We should also think about how these unexpected problems can be avoided. Perhaps it's a mistake to overload the meaning of the existing commands, and we should present the enhanced ones with a dgsh- prefix, e.g. dgsh-cat.

mfragkoulis commented 6 years ago

That's indeed the problem. Regarding reporting, it can surely be improved. I'll take care of it. Regarding the avoidance of such problems, being explicit about using the dgsh version or the original version of a tool is one way to go. I see is some drawbacks here in that we lose transparency and elegance in scripts and place some burden on users who now have to decide between tool versions. Strictly this is not required because dgsh tool versions can also behave as the original ones.

From another angle, I think that the problem is more related with use cases where more than one connection arrangements exist and the algorithm that finds a solution is not able to tell which one the programmer wants. This possibility will manifest more often when multipipe blocks can be connected to one another ({{ ... }} | {{ ... }). Because the exact connection constraints is a piece of knowledge that only the programmer possesses I would envisage a solution that gives the programmer a way to state a connection constraint for a tool that is flexible in its input and/or output channels. For instance, in the example @trantor provided the cat command within the multipipe block could become: DGSH_CONN_IN=1 cat & The env variable will be picked up by the negotiate library and so cat will enter the negotiation procedure with input channels constrained to 1 (instead of -1, which denotes that it's flexible). Then sort will be the only tool with flexible input constraint in the multipipe block and the algorithm will have no problem computing sort's and all other tools' connections. How does that sound?

trantor commented 6 years ago

@dspinellis I've replaced calls to cat, sort and grep with calls to /bin/TOOL, apart from the call to xzcat (which might wrap cat on its own, I am not sure) and the only call to grep requiring multiple inputs. The result is as follows.

+ MITTENTE=messages@mailagenziaentrate40.top
+ shift
16798: dgsh command

+ dgsh-wrap xzcat -f logfile
+ grep -F --matching-lines --file=-
+ dgsh-wrap /bin/grep -oP '(?<=: to=<)[^>]+'
+ dgsh-conc -o 2
16799: After getopt: ninputs=1, noutputs=1 optind=1 argv[optind]=xzcat
16799: program_supplied=0
16799: calling negotiate with ninputs=1 noutputs=1
16799: dgsh_negotiate(): Tool xzcat with pid 16799 negotiating: nin=1 nout=1.
16799: Message block created by process xzcat with pid 16799.

16799: add_node(): Added node xzcat in position 0 on dgsh graph, initiator: 16799
+ dgsh-conc -i 2
16799: write_message_block(): xzcat (0)
+ dgsh-wrap /bin/grep -E 'Subject:.*from=<messages@mailagenziaentrate40.top'
+ dgsh-wrap /bin/grep -oP '[[:xdigit:]]+(?=: info: header Subject:)'
+ dgsh-wrap /bin/sort -u
16802: After getopt: ninputs=1, noutputs=1 optind=1 argv[optind]=/bin/grep
+ wait
16802: program_supplied=0
16802: calling negotiate with ninputs=1 noutputs=1
16802: dgsh_negotiate(): Tool grep with pid 16802 negotiating: nin=1 nout=1.
16803: read_message_block(): (null) (0)
16803: write_message_block(): (null) (0)
16801: dgsh_negotiate(): Tool grep -F --matching-lines with pid 16801 negotiating: nin=-1 nout=1.
16806: After getopt: ninputs=1, noutputs=1 optind=1 argv[optind]=/bin/grep
16806: program_supplied=0
16806: calling negotiate with ninputs=1 noutputs=1
16806: dgsh_negotiate(): Tool grep with pid 16806 negotiating: nin=1 nout=1.
16807: After getopt: ninputs=1, noutputs=1 optind=1 argv[optind]=/bin/sort
16807: program_supplied=0
16805: After getopt: ninputs=1, noutputs=1 optind=1 argv[optind]=/bin/grep
16805: program_supplied=0
+ dgsh-wrap /bin/cat
16807: calling negotiate with ninputs=1 noutputs=1
16805: calling negotiate with ninputs=1 noutputs=1
16807: dgsh_negotiate(): Tool sort with pid 16807 negotiating: nin=1 nout=1.
16805: dgsh_negotiate(): Tool grep with pid 16805 negotiating: nin=1 nout=1.
16805: read_message_block(): grep (0)
16805: add_node(): Added node grep in position 1 on dgsh graph, initiator: 16799
16805: write_message_block(): grep (1)
16806: read_message_block(): grep (0)
16806: add_node(): Added node grep in position 2 on dgsh graph, initiator: 16799
16806: add_node(): Added node grep in position 2 on dgsh graph, initiator: 16799
16806: write_message_block(): grep (2)
16807: read_message_block(): sort (0)
16807: add_node(): Added node sort in position 3 on dgsh graph, initiator: 16799
16807: write_message_block(): sort (3)
16804: read_message_block(): (null) (0)
16804: write_message_block(): (null) (0)
16801: read_message_block(): grep -F --matching-lines (0)
16801: add_node(): Added node grep -F --matching-lines in position 4 on dgsh graph, initiator: 16799
16801: write_message_block(): grep -F --matching-lines (4)
16802: read_message_block(): grep (0)
16802: add_node(): Added node grep in position 5 on dgsh graph, initiator: 16799
16802: write_message_block(): grep (5)
16801: read_message_block(): grep -F --matching-lines (4)
16801: write_message_block(): grep -F --matching-lines (4)
16804: read_message_block(): (null) (0)
16804: write_message_block(): (null) (0)
16807: read_message_block(): sort (3)
16807: write_message_block(): sort (3)
16806: read_message_block(): grep (2)
16806: write_message_block(): grep (2)
16805: read_message_block(): grep (1)
16805: write_message_block(): grep (1)
16803: read_message_block(): (null) (0)
16803: write_message_block(): (null) (0)
16808: After getopt: ninputs=1, noutputs=1 optind=1 argv[optind]=/bin/cat
16808: program_supplied=0
16808: calling negotiate with ninputs=1 noutputs=1
16808: dgsh_negotiate(): Tool cat with pid 16808 negotiating: nin=1 nout=1.
16808: read_message_block(): cat (0)
16808: add_node(): Added node cat in position 6 on dgsh graph, initiator: 16799
16808: write_message_block(): cat (6)
16804: read_message_block(): (null) (0)
16804: write_message_block(): (null) (0)
16808: read_message_block(): cat (6)
16808: write_message_block(): cat (6)
16803: read_message_block(): (null) (0)
16803: write_message_block(): (null) (0)
16799: read_message_block(): xzcat (0)
16799: dgsh_negotiate(): Gathered I/O requirements.
dgsh: No solution was found to satisfy the I/O requirements of the following 2 participating processes:
xzcat (nout=1)
cat (nin=1)
16799: write_message_block(): xzcat (0)
16803: read_message_block(): (null) (0)
16803: write_message_block(): (null) (0)
16805: read_message_block(): grep (1)
16805: write_message_block(): grep (1)
16806: read_message_block(): grep (2)
16806: write_message_block(): grep (2)
16807: read_message_block(): sort (3)
16807: write_message_block(): sort (3)
16804: read_message_block(): (null) (0)
16804: write_message_block(): (null) (0)
16801: read_message_block(): grep -F --matching-lines (4)
16801: write_message_block(): grep -F --matching-lines (4)
16802: read_message_block(): grep (5)
16802: write_message_block(): grep (5)
16801: read_message_block(): grep -F --matching-lines (4)
16802: dgsh_negotiate(): grep (5) leaves after write with state ERROR.
16801: write_message_block(): grep -F --matching-lines (4)
16804: read_message_block(): (null) (0)
16801: dgsh_negotiate(): grep -F --matching-lines (4) leaves after write with state ERROR.
16804: write_message_block(): (null) (0)
16807: read_message_block(): sort (3)
16807: write_message_block(): sort (3)
16807: dgsh_negotiate(): sort (3) leaves after write with state ERROR.
16806: read_message_block(): grep (2)
16806: write_message_block(): grep (2)
16806: dgsh_negotiate(): grep (2) leaves after write with state ERROR.
16805: read_message_block(): grep (1)
16805: write_message_block(): grep (1)
16805: dgsh_negotiate(): grep (1) leaves after write with state ERROR.
16803: read_message_block(): (null) (0)
16803: write_message_block(): (null) (0)
16808: read_message_block(): cat (6)
16808: write_message_block(): cat (6)
16804: read_message_block(): (null) (0)
16804: write_message_block(): (null) (0)
16804: conc with pid 16804 terminates with error
16808: read_message_block(): cat (6)
16808: write_message_block(): cat (6)
16808: dgsh_negotiate(): cat (6) leaves after write with state ERROR.
16803: read_message_block(): (null) (0)
16803: write_message_block(): (null) (0)
16803: conc with pid 16803 terminates with error
16799: read_message_block(): xzcat (0)
16799: dgsh_negotiate(): xzcat (0) leaves after read with state ERROR.
dgsh-wrap: dgsh negotiation failed: Protocol error

@mfragkoulis I think that manually specifying all of that might prove to be a stumbling block for the average sysadmin wanting to adopt the tool. Take this script for instance: grep was called a number of times, but just once taking advantage of its form using multiple inputs. I suspect that in a long script that would often be the case. Specifying 10 times that grep negotiates a single input stream when you actually use 2 times the version with multiple inputs might put off users. Not to mention all the people who might have difficulties understanding where and when and why you need explicitly state that (you described the end user as a programmer, but I think he/she might be anything but ;) ).

mfragkoulis commented 6 years ago

It seems thatxzcat has been wrapped: + dgsh-wrap xzcat -f logfile. This also explains why the algorithm reports that: dgsh: No solution was found to satisfy the I/O requirements of the following 2 participating processes: xzcat (nout=1)

Can you post the updated script?

Regarding the env variable I understand your viewpoint. Just as a clarification, this would only be required as a conflict resolution mechanism (so only cat would require it in your example) and combined with a precise/guided error report message it might be useful to have.

trantor commented 6 years ago

@mfragkoulis @dspinellis I think I've solved the issue. Thinking about it I realized I should have piped the output of xzcat through tee to the multipipe block, instead of to the multipipe block directly. Now it works as expected, even with the overloaded versions of grep and sort. This was just the beginning of the script, so I need to work on it a bit more. If you need anything else in the meantime just ask.

trantor commented 6 years ago

A question: are while commands not allowed inside a multipipe block?

mfragkoulis commented 6 years ago

That's correct, they are not allowed currently, but we're working on it.

trantor commented 6 years ago

Sorry again, but I found another strange behaviour. Different behaviour between two different installation of dgsh (latest commit, compiled from scratch) on two different machines, both Ubuntu 16.04. I tried the following code, which was working perfectly on a machine, and got bizarre results on a different machine. Then I noticed even stranger errors. Not sure about the cause though.

$ echo 1 | sed -r 's/^: //; s/: to=</ /;'
1
$ dgsh
dgsh $ echo 1 | sed -r 's/^: //; s/: to=</ /;'
/bin/sed: -e expression #1, char 7: extra characters after command
dgsh $ sed -r 's/^: //; s/: to=</ /;' <( echo 1)
1
dgsh $ sed -r 's/^: //; s/: to=</ /;'           
1
1
dgsh $ 

Same command.

$ export DGSH_DEBUG_LEVEL=3
$ dgsh
dgsh $ echo 1 | sed -r 's/^: //; s/: to=</ /;'
22710: After getopt: ninputs=0, noutputs=1 optind=4 argv[optind]=/usr/local/libexec/dgsh/echo
22710: program_supplied=0
22710: calling negotiate with ninputs=0 noutputs=1
22710: dgsh_negotiate(): Tool echo with pid 22710 negotiating: nin=0 nout=1.
22710: Message block created by process echo with pid 22710.

22710: add_node(): Added node echo in position 0 on dgsh graph, initiator: 22710                                                                                                                
22710: write_message_block(): echo (0)                                                                                                                                                          
22711: After getopt: ninputs=1, noutputs=1 optind=1 argv[optind]=/bin/sed                                                                                                                       
22711: program_supplied=0                                                                                                                                                                       
22711: calling negotiate with ninputs=1 noutputs=1                                                                                                                                              
22711: dgsh_negotiate(): Tool sed with pid 22711 negotiating: nin=1 nout=1.                                                                                                                     
22711: read_message_block(): sed (0)                                                                                                                                                            
22711: add_node(): Added node sed in position 1 on dgsh graph, initiator: 22710
22711: write_message_block(): sed (1)
22710: read_message_block(): echo (0)
22710: dgsh_negotiate(): Gathered I/O requirements.
22710: dgsh_negotiate(): Computed solution
22710: write_message_block(): echo (0)
22711: read_message_block(): sed (1)
22711: write_message_block(): sed (1)
22710: read_message_block(): echo (0)
22711: dgsh_negotiate(): sed (1) leaves after write with state COMPLETE.
22710: dgsh_negotiate(): Communicated the solution
22710: dgsh_negotiate(): echo (0) leaves after read with state COMPLETE.
22710: establish_io_connections(): successful for node echo at index 0
22711: establish_io_connections(): successful for node sed at index 1
/bin/sed: -e expression #1, char 7: extra characters after command

This happens only when I pipe something to sed using dgsh, so I am speculating dgsh might have something to do with it. Same thing with sed used in a dgsh script.

mfragkoulis commented 6 years ago

That's odd. I get the expected output on Debian 8. In addition, the log you provide shows that the negotiation process has completed successfully. Except for process ids, I get an identical log. sed is wrapped. I wonder whether there is a subtle behaviour anomaly with dgsh-wrap. Can you post in a gist the output log with DGSH_DEBUG_LEVEL=4?

trantor commented 6 years ago

I will. Although it might perhaps be a bash bug of some kind.

trantor commented 6 years ago

Here it is

dgsh $ echo 1 | sed -r 's/^: //; s/: to=</ /;'
28123: pipe_in: -1, pipe_out: -1, asynchronous: 0

28123: dgsh_nest_level: -1, command: echo 1 | sed -r 's/^: //; s/: to=</ /;', command type: 6
28123: command type: 6, command: echo 1 | sed -r 's/^: //; s/: to=</ /;'

28123: cm_connection case

28123: pipe_in: -1, pipe_out: -1

28123: Created socketpair 3 -- 4

28123: Check if wrapping is required
28123: go execute_command_internal

28123: pipe_in: -1, pipe_out: 4, asynchronous: 0

28123: dgsh_nest_level: -1, command: echo 1, command type: 4
28123: command type: 4, command: echo 1

28123: cm_simple case

28123: pipe_in: -1, pipe_out: 4, async: 0

28123: dofork?: 1

28123: for simple command pipe_in: -1, pipe_out: 4, dgsh_in: 0, dgsh_out: 0, executing_function: 0
28123: set DGSH_OUT=1
28123: set_dgsh_path(), path: /home/fulvio/bin:/home/fulvio/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin, dgshpath: /usr/local/libexec/dgsh
28123: set_dgsh_path(): after prepending, path: /usr/local/libexec/dgsh:/home/fulvio/bin:/home/fulvio/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
28123: after putenv pipe_in: -1, pipe_out: 4, dgsh_in: 0, dgsh_out: 0, executing_function: 0
28123: Return 0 from fork
28123: Completed waiting for pid: 28279
28123: exit switch for command: 4
28123: exec_result: 0
28123: last command exit value: 0
28123: Check if wrapping is required
28123: pipe_in: 3, pipe_out: -1, asynchronous: 0

28123: dgsh_nest_level: -1, command: sed -r 's/^: //; s/: to=</ /;', command type: 4
28123: Unset dgshpath from path
28123: dgshpath in path: /usr/local/libexec/dgsh:/home/fulvio/bin:/home/fulvio/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
28123: dgshpath not in path: /home/fulvio/bin:/home/fulvio/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
28123: command type: 4, command: sed -r 's/^: //; s/: to=</ /;'

28123: cm_simple case

28123: pipe_in: 3, pipe_out: -1, async: 0

28123: dofork?: 1

28123: for simple command pipe_in: 3, pipe_out: -1, dgsh_in: 0, dgsh_out: 0, executing_function: 0
28123: set DGSH_IN=1
28279: close fd 3

28123: set_dgsh_path(), path: /home/fulvio/bin:/home/fulvio/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin, dgshpath: /usr/local/libexec/dgsh
28123: set_dgsh_path(): after prepending, path: /usr/local/libexec/dgsh:/home/fulvio/bin:/home/fulvio/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
28279: go do piping after make_child()

28279: pipe_in: -1, pipe_out: 4

28279: dup-close fd 4 to get stdout

28123: after putenv pipe_in: 3, pipe_out: -1, dgsh_in: 0, dgsh_out: 0, executing_function: 0
28279: Path to command: /usr/local/libexec/dgsh/echo
28279: Command echo is a shell or special builtin; override
28279: Command /usr/local/libexec/dgsh/echo is in the dgsh path
28279: go execute disk command
28123: Return 0 from fork
28279: go do piping

28279: pipe_in: -1, pipe_out: -1

28279: go shell_execve
28280: go do piping after make_child()

28280: pipe_in: 3, pipe_out: -1

28280: dup-close fd 3 to get stdin

28280: Path to command: /usr/local/libexec/dgsh/sed
28280: Command /usr/local/libexec/dgsh/sed is in the dgsh path
28280: go execute disk command
28280: go do piping

28280: pipe_in: -1, pipe_out: -1

28280: go shell_execve
28279: PATH before: [/usr/local/libexec/dgsh:/home/fulvio/bin:/home/fulvio/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin]
28279: PATH after: [/home/fulvio/bin:/home/fulvio/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/binsr/local/games:/snap/bin]
28279: Initial arguments
28279: argv[0]: [/usr/local/libexec/dgsh/dgsh-wrap]
28279: argv[1]: [-s -i 0]
28279: argv[2]: [/usr/local/libexec/dgsh/echo]
28279: argv[3]: [1]
28279: argv[4]: [(null)]
28279: Arguments after split_argv
28279: argv[0]: [/usr/local/libexec/dgsh/dgsh-wrap]
28279: argv[1]: [-s]
28279: argv[2]: [-i]
28279: argv[3]: [0]
28279: argv[4]: [/usr/local/libexec/dgsh/echo]
28279: argv[5]: [1]
28279: argv[6]: [(null)]
28279: getopt switch=s
28279: getopt switch=i
28279: After getopt: ninputs=0, noutputs=1 optind=4 argv[optind]=/usr/local/libexec/dgsh/echo
28279: program_supplied=0
28279: Arguments after processing program name (optind=4)
28279: argv[0]: [/usr/local/libexec/dgsh/dgsh-wrap]
28279: argv[1]: [-s]
28279: argv[2]: [-i]
28279: argv[3]: [0]
28279: argv[4]: [echo]
28279: argv[5]: [1]
28279: argv[6]: [(null)]
28279: guest_program_name: echo
28279: embedded_args=0
28279: calling negotiate with ninputs=0 noutputs=1
28279: dgsh_negotiate(): Tool echo with pid 28279 negotiating: nin=0 nout=1.
28279: Try to get environment variable DGSH_IN.
28279: getenv() returned string value 0.
28279: Integer form of value is 0.
28279: Try to get environment variable DGSH_OUT.
28279: getenv() returned string value 1.
28279: Integer form of value is 1.
28279: Message block created by process echo with pid 28279.

28279: fill_node(): dgsh_in: 0, self_node.requires_channels: 0
28279: fill_node(): dgsh_out: 1, self_node.provides_channels: 1
28279: Dgsh node for tool echo with pid 28279 created.

28279: add_node(): Added node echo in position 0 on dgsh graph, initiator: 28279
28279: Dgsh graph now has 1 nodes.

28280: PATH before: [/usr/local/libexec/dgsh:/home/fulvio/bin:/home/fulvio/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin]
28279: dgsh_negotiate(): perform round
28279: Next operation is a write
28279: write on fd 1 is active.
28279: set_dispatcher(): message block origin set to 0 and writing on the output side
28279: write_message_block(): echo (0)
28279: write_message_block(): Write message block.
28279: Try write struct of size: 88
28279: write_message_block(): Wrote message block of size 88 bytes 
28279: Try write struct of size: 124
28279: write_message_block(): Wrote nodes of size 124 bytes 
28279: write_message_block(): Shipped message block or solution to next node in graph from file descriptor: 1.

28279: dgsh_negotiate(): perform round
28280: PATH after: [/home/fulvio/bin:/home/fulvio/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/binsr/local/games:/snap/bin]
28279: Next operation is a read
28280: Initial arguments
28280: argv[0]: [/usr/local/libexec/dgsh/dgsh-wrap]
28280: argv[1]: [/bin/sed]
28280: argv[2]: [/usr/local/libexec/dgsh/sed]
28280: argv[3]: [-r]
28280: argv[4]: [s/^: //; s/: to=</ /;]
28280: argv[5]: [(null)]
28280: After getopt: ninputs=1, noutputs=1 optind=1 argv[optind]=/bin/sed
28280: program_supplied=0
28280: Arguments after processing program name (optind=1)
28280: argv[0]: [/usr/local/libexec/dgsh/dgsh-wrap]
28280: argv[1]: [/bin/sed]
28280: argv[2]: [/usr/local/libexec/dgsh/sed]
28280: argv[3]: [-r]
28280: argv[4]: [s/^: //; s/: to=</ /;]
28280: argv[5]: [(null)]
28280: guest_program_name: sed
28280: embedded_args=0
28280: calling negotiate with ninputs=1 noutputs=1
28280: dgsh_negotiate(): Tool sed with pid 28280 negotiating: nin=1 nout=1.
28280: Try to get environment variable DGSH_IN.
28280: getenv() returned string value 1.
28280: Integer form of value is 1.
28280: Try to get environment variable DGSH_OUT.
28280: getenv() returned string value 0.
28280: Integer form of value is 0.
28280: dgsh_negotiate(): perform round
28280: Next operation is a read
28280: read on fd 0 is active.
28280: read_message_block(): sed (0)
28280: read_chunk(): buf_size: 88, IOV_MAX: 1024
28280: Try read from fd 0.
28280: Read succeeded: 88 bytes read from 0.

28280: read_chunk(): buf_size: 124, IOV_MAX: 1024
28280: Try read from fd 0.
28280: Read succeeded: 124 bytes read from 0.

28280: alloc_copy_nodes(): Node array recovered.
28280: read_message_block(): Read message block or solution from node 0 sent from file descriptor: stdout.

28280: node name: echo, pid: 28279
28280: fill_node(): dgsh_in: 1, self_node.requires_channels: 1
28280: fill_node(): dgsh_out: 0, self_node.provides_channels: 1
28280: Dgsh node for tool sed with pid 28280 created.

28280: add_node(): Added node sed in position 1 on dgsh graph, initiator: 28279
28280: Dgsh graph now has 2 nodes.

28280: New dgsh edge from 0 to 1 with 0 instances.
28280: Added edge (0 -> 1) in dgsh graph.

28280: Dgsh graph now has 1 edges.

28280: dgsh_negotiate(): perform round
28280: Next operation is a write
28280: write on fd 0 is active.
28280: set_dispatcher(): message block origin set to 1 and writing on the input side
28280: write_message_block(): sed (1)
28280: write_message_block(): Write message block.
28280: Try write struct of size: 88
28280: write_message_block(): Wrote message block of size 88 bytes 
28279: read on fd 1 is active.
28280: Try write struct of size: 248
28279: read_message_block(): echo (0)
28280: write_message_block(): Wrote nodes of size 248 bytes 
28279: read_chunk(): buf_size: 88, IOV_MAX: 1024
28280: Try write struct of size: 20
28279: Try read from fd 1.
28280: write_message_block(): Wrote edges of size 20 bytes 
28279: Read succeeded: 88 bytes read from 1.

28280: write_message_block(): Shipped message block or solution to next node in graph from file descriptor: 0.

28279: read_chunk(): buf_size: 248, IOV_MAX: 1024
28280: dgsh_negotiate(): perform round
28279: Try read from fd 1.
28280: Next operation is a read
28279: Read succeeded: 248 bytes read from 1.

28279: alloc_copy_nodes(): Node array recovered.
28279: read_message_block(): Read 1 negotiation graph edges.
28279: read_chunk(): buf_size: 20, IOV_MAX: 1024
28279: Try read from fd 1.
28279: Read succeeded: 20 bytes read from 1.

28279: read_message_block(): Read message block or solution from node 1 sent from file descriptor: stdin.

28279: node name: echo, pid: 28279
28279: New dgsh edge from 0 to 1 with 0 instances.
28279: lookup_dgsh_edge(): Edge 0 to 1 exists.
28279: dgsh_negotiate(): Gathered I/O requirements.
28279: node_match_constraints(): node at index 0.
28279: Node echo, index 0, channels required 0, channels_provided 1, dgsh_in 0, dgsh_out 1.
28279: dry_match_io_constraints(): edge at index 0 from 0 to 1, instances 0, from_instances 0, to_instances 0.
28279: dry_match_io_constraints(): Node at index 0 has 1 outgoing edges and 0 incoming.
28279: satisfy_io_constraints(): edge from 0 to 1, is_edge_incoming: 0, free_instances: 0, weight: 1, modulo: 0, from_instances: 1, to_instances: 0.

28279: satisfy_io_constraints(): Number of edges: 1, this_channel_constraint: 1, free instances: 0.

28279: node_match_constraints(): node at index 1.
28279: Node sed, index 1, channels required 1, channels_provided 1, dgsh_in 1, dgsh_out 0.
28279: dry_match_io_constraints(): edge at index 0 from 0 to 1, instances 0, from_instances 1, to_instances 0.
28279: dry_match_io_constraints(): Node at index 1 has 0 outgoing edges and 1 incoming.
28279: satisfy_io_constraints(): edge from 0 to 1, is_edge_incoming: 1, free_instances: 0, weight: 1, modulo: 0, from_instances: 1, to_instances: 1.

28279: satisfy_io_constraints(): Number of edges: 1, this_channel_constraint: 1, free instances: 0.

28279: cross_match_constraints(): node echo, index 0, channels required 0, channels_provided 1, dgsh_in 0, dgsh_out 1.
28279: cross_match_io_constraints(): edge from 0 to 1, this_channel_constraint: 1, is_incoming: 0, from_instances: 1, to_instances 1, edge instances: 1.

28279: Outgoing communication endpoints to setup: 1, constraint: 1
28279: cross_match_constraints(): node sed, index 1, channels required 1, channels_provided 1, dgsh_in 1, dgsh_out 0.
28279: cross_match_io_constraints(): edge from 0 to 1, this_channel_constraint: 1, is_incoming: 1, from_instances: 1, to_instances 1, edge instances: 1.

28279: Incoming communication endpoints to setup: 1, constraint: 1
28279: cross_match_constraints(): Cross matched constraints of 2 out of 2 nodes for 1 edges out of 1 edges.
28279: solve_graph(): exit_state: 0, retries: 0
28279: prepare_solution(): Node echo, pid: 28279, connections in: 0, connections out: 1.
28279: make_compact_edge_array():Copied edge 0 -> 1 (1) at index 0.
28279: prepare_solution(): Node sed, pid: 28280, connections in: 1, connections out: 0.
28279: make_compact_edge_array():Copied edge 0 -> 1 (1) at index 0.
28279: calculate_conc_fds for 0 n_concs
28279: solve_graph: exit_state: 0
28279: dgsh_negotiate(): Computed solution
28279: dgsh_negotiate(): perform round
28279: Next operation is a write
28279: write on fd 1 is active.
28279: set_dispatcher(): message block origin set to 0 and writing on the output side
28279: write_message_block(): echo (0)
28279: write_message_block(): Write message block.
28279: Try write struct of size: 88
28279: write_message_block(): Wrote message block of size 88 bytes 
28280: read on fd 0 is active.
28279: Try write struct of size: 248
28280: read_message_block(): sed (1)
28279: write_message_block(): Wrote nodes of size 248 bytes 
28280: read_chunk(): buf_size: 88, IOV_MAX: 1024
28279: Try write struct of size: 80
28280: Try read from fd 0.
28279: write_graph_solution(): Wrote graph solution of size 80 bytes 
28280: Read succeeded: 88 bytes read from 0.

28279: Try write struct of size: 20
28280: read_chunk(): buf_size: 248, IOV_MAX: 1024
28279: write_graph_solution(): Wrote node's 0 1 outgoing edges of size 20 bytes 
28280: Try read from fd 0.
28279: Try write struct of size: 20
28280: Read succeeded: 248 bytes read from 0.

28279: write_graph_solution(): Wrote node's 1 1 incoming edges of size 20 bytes 
28280: alloc_copy_nodes(): Node array recovered.
28279: write_message_block(): Shipped message block or solution to next node in graph from file descriptor: 1.

28280: read_chunk(): buf_size: 80, IOV_MAX: 1024
28279: dgsh_negotiate(): perform round
28280: Try read from fd 0.
28279: Next operation is a read
28280: Read succeeded: 80 bytes read from 0.

28280: Node 0 with 0 incoming edges at 0 and 1 outgoing edges at 1ef7400.
28280: read_chunk(): buf_size: 20, IOV_MAX: 1024
28280: Try read from fd 0.
28280: Read succeeded: 20 bytes read from 0.

28280: Node 1 with 1 incoming edges at 1ef73a0 and 0 outgoing edges at 0.
28280: read_chunk(): buf_size: 20, IOV_MAX: 1024
28280: Try read from fd 0.
28280: Read succeeded: 20 bytes read from 0.

28280: read_message_block(): Read message block or solution from node 0 sent from file descriptor: stdout.

28280: dgsh_negotiate(): perform round
28280: Next operation is a write
28280: write on fd 0 is active.
28280: set_dispatcher(): message block origin set to 1 and writing on the input side
28280: write_message_block(): sed (1)
28280: write_message_block(): Write message block.
28280: Try write struct of size: 88
28280: write_message_block(): Wrote message block of size 88 bytes 
28279: read on fd 1 is active.
28280: Try write struct of size: 248
28279: read_message_block(): echo (0)
28280: write_message_block(): Wrote nodes of size 248 bytes 
28279: read_chunk(): buf_size: 88, IOV_MAX: 1024
28280: Try write struct of size: 80
28279: Try read from fd 1.
28280: write_graph_solution(): Wrote graph solution of size 80 bytes 
28279: Read succeeded: 88 bytes read from 1.

28280: Try write struct of size: 20
28279: read_chunk(): buf_size: 248, IOV_MAX: 1024
28280: write_graph_solution(): Wrote node's 0 1 outgoing edges of size 20 bytes 
28279: Try read from fd 1.
28280: Try write struct of size: 20
28279: Read succeeded: 248 bytes read from 1.

28280: write_graph_solution(): Wrote node's 1 1 incoming edges of size 20 bytes 
28279: alloc_copy_nodes(): Node array recovered.
28280: write_message_block(): Shipped message block or solution to next node in graph from file descriptor: 0.

28279: read_chunk(): buf_size: 80, IOV_MAX: 1024
28280: dgsh_negotiate(): sed (1) leaves after write with state COMPLETE.
28279: Try read from fd 1.
28280: alloc_io_fds(): self node: 1, incoming edges: 1, outgoing edges: 0
28279: Read succeeded: 80 bytes read from 1.

28280: read_input_fds(): 1 incoming edges to inspect of node 1.
28279: Node 0 with 0 incoming edges at 0 and 1 outgoing edges at 144f4e0.
28279: read_chunk(): buf_size: 20, IOV_MAX: 1024
28279: Try read from fd 1.
28279: Read succeeded: 20 bytes read from 1.

28279: Node 1 with 1 incoming edges at 144f500 and 0 outgoing edges at 0.
28279: read_chunk(): buf_size: 20, IOV_MAX: 1024
28279: Try read from fd 1.
28279: Read succeeded: 20 bytes read from 1.

28279: read_message_block(): Read message block or solution from node 1 sent from file descriptor: stdin.

28279: dgsh_negotiate(): Communicated the solution
28279: dgsh_negotiate(): echo (0) leaves after read with state COMPLETE.
28279: alloc_io_fds(): self node: 0, incoming edges: 0, outgoing edges: 1
28279: read_input_fds(): 0 incoming edges to inspect of node 0.
28279: write_output_fds(): for node at index 0 with 1 outgoing edges.
28279: write_output_fds(): created pipe pair 3 - 4. Transmitting fd 3 through sendmsg().
28279: establish_io_connections(): input fds: 0, output fds: 1
28280: read_input_fds: Node 1 received file descriptor 3.
28280: write_output_fds(): for node at index 1 with 0 outgoing edges.
28280: establish_io_connections(): input fds: 1, output fds: 0
28279: establish_io_connections(): closed STDOUT, dup 4 returned 1
28279: establish_io_connections(): successful for node echo at index 0
28279: free_graph_solution: freed 2 nodes.
28280: establish_io_connections(): closed STDIN, dup 3 returned 0
28279: free_mb(): Freed message block.
28280: establish_io_connections(): successful for node sed at index 1
28279: Arguments to execvp after substitung <| and >|
28280: free_graph_solution: freed 2 nodes.
28279: argv[0]: [echo]
28280: free_mb(): Freed message block.
28279: argv[1]: [1]
28280: Arguments to execvp after substitung <| and >|
28279: argv[2]: [(null)]
28280: argv[0]: [/bin/sed]
28280: argv[1]: [/usr/local/libexec/dgsh/sed]
28280: argv[2]: [-r]
28280: argv[3]: [s/^: //; s/: to=</ /;]
28280: argv[4]: [(null)]
/bin/sed: -e expression #1, char 7: extra characters after command
28123: Completed waiting for pid: 28280
28123: exit switch for command: 4
28123: exec_result: 1
28123: last command exit value: 1
28123: return exec_result: 1
28123: exit switch for command: 6
28123: exec_result: 1
28123: last command exit value: 1
28123: go yyparse()
trantor commented 6 years ago

Something else quite strange

dgsh $ echo 1 | sed -r -e 's/^: //; s/: to=</ /;'
#!/usr/local/libexec/dgsh/dgsh-wrap /bin/sed
dgsh $
mfragkoulis commented 6 years ago

Yeap, it's an issue with dgsh-wrap. From your previous log of command echo 1 | sed -r 's/^: //; s/: to=</ /;':

28280: argv[0]: [/bin/sed]
28280: argv[1]: [/usr/local/libexec/dgsh/sed]
28280: argv[2]: [-r]
28280: argv[3]: [s/^: //; s/: to=</ /;]
28280: argv[4]: [(null)]

and this is what it should be:

17379: argv[0]: [sed]
17379: argv[1]: [-r]
17379: argv[2]: [s/^: //; s/: to=</ /;]
17379: argv[3]: [(null)]

I suspect the latest example you posted manifests the same issue. Notably, on Debian 8 both commands run fine. I'll take a look at dgsh-wrap's code shortly, but I wonder what makes an installation of dgsh on Ubuntu 16.04 behave differently from another installation of dgsh again on Ubuntu 16.04. This is the case right?

trantor commented 6 years ago

Yes, this is precisely the case. On the other machine I see the same argument array you see on your Debian 8. Not sure whether it's some sort of interference with a previous version of dgsh I had installed before. The dgsh with the problem above was built a couple of days ago, more or less, from a clean clone.

trantor commented 6 years ago

Ok. Leftover pieces of a former dgsh installation under $PREFIX/libexec/dgsh were the problem. Now it works as expected.

trantor commented 6 years ago

I am going to close the issue since I found the crux of the matter a while ago. Thanks for the assistance. I'll be back if I encounter other problems.