DUNE-DAQ / daq-buildtools

Make life for developers easier through a collection of CMake functions and scripts
0 stars 1 forks source link

dbt-build.sh from v2.1.1 tag loses color and eats lines after it in paste #111

Closed ron003 closed 3 years ago

ron003 commented 3 years ago

Using script is fundamentally flawed because it sends output to stdout and to a file so the output going to the file can only be filtered inside of the command line given in its command option. When filtering is done inside the command option, the command ahead of the filter (the CMake command in our case) will no longer consider its output to be a tty (which would be the point of using script).

Of the other options explored (socat and unbuffer), only script reads from stdin (apparently assuming the command given to it wants stdin). An undesirable workaround would be to have the command be executed after dbt-build.sh in a paste buffer to be part of a pipeline or escaped command line:

dbt-build.sh --install &&
dbt-setup-runtime-environment; \
echo hi;
echo there

In the above echo there will be consumed by script inside of dbt-build.sh.

I understand that the other options are not optimal because the rpm (packages) that provides them may not be installed. To that end, I have created to small python scripts which could be added directly to dbt-build.sh. For example, script can be replaced by:

py_pty='import sys,os,signal
stdin,stdout,stderr=0,1,2
ptm,pts=os.openpty();pid=os.fork()
if pid == 0:
    os.dup2(pts, stdout)
    sts = os.system(sys.argv[1]); sys.exit(sts>>8)
os.close(pts)
signal.signal(signal.SIGINT, signal.SIG_IGN)
while True:
    try:            chunk = os.read(ptm, 4096)
    except OSError: break
    try:                    os.write(stdout, chunk)
    except BrokenPipeError: os.kill(pid, signal.SIGTERM); break
wait_pid, status = os.waitpid(pid, 0); exit(status >> 8)'

and used:

python -c "$py_pty" "${cmd}"

With the above, the output will contain color and a "single progress line" (which will need filtering to turn into multi-line progress). There is a fundamental problem when trying to use sed to accomplish this filtering. Sed is line (that is line feed and not carriage return) oriented. NOTE: the current sed including in ... ${cmd} |& sed -e 's/\r/\n/g' is actually not doing anything because cmake, being piped, stops sending \r. In other words, the current script implementation is basically equivalent to ${cmd} | tee $build_log. Of the 3 major filter commands (sed, Perl, awk), only awk can do this because it supports "Record Separator" that can be multiple characters, i.e. both carriage return and line feed. However, there is more to discuss here, but because this description is getting long, I'll save this for later. Just know, I believe I have a small python script (similar to the above) which I believe will do what is required.

Speaking of requirements, here's a list, as I currently see things (and after some discussion with John and Alessandro) (not prioritized):

  1. Use Ninja generator by default for speed
  2. Colored output
  3. No single line progress
  4. No buffering - print lines (e.g progress) near-real-time (i.e ASAP)
  5. save to log file (Note: less with the -R option handles color)
  6. dbt-build.sh should not "eat" successive lines pasted; don't require a big pipeline.
  7. pty script readily available
alessandrothea commented 3 years ago

Hi @ron003, Thanks for looking into this. Indeed a python script seems the easiest way to achieve the behaviour we want and be independent by any system tool.

I quickly googled around how people re-implement tee in python, if they have to. This stackoverflow answer looks interesting:

class RunOutput():
    def __init__(self, returncode, stdout, stderr):
        self.returncode = returncode
        self.stdout = stdout
        self.stderr = stderr

async def _read_stream(stream, callback):
    while True:
        line = await stream.readline()
        if line:
            callback(line)
        else:
            break

async def _stream_subprocess(cmd, stdin=None, quiet=False, echo=False) -> RunOutput:
    if isWindows():
        platform_settings = {'env': os.environ}
    else:
        platform_settings = {'executable': '/bin/bash'}

    if echo:
        print(cmd)

    p = await asyncio.create_subprocess_shell(cmd,
                                              stdin=stdin,
                                              stdout=asyncio.subprocess.PIPE,
                                              stderr=asyncio.subprocess.PIPE,
                                              **platform_settings)
    out = []
    err = []

    def tee(line, sink, pipe, label=""):
        line = line.decode('utf-8').rstrip()
        sink.append(line)
        if not quiet:
            print(label, line, file=pipe)

    await asyncio.wait([
        _read_stream(p.stdout, lambda l: tee(l, out, sys.stdout)),
        _read_stream(p.stderr, lambda l: tee(l, err, sys.stderr, label="ERR:")),
    ])

    return RunOutput(await p.wait(), out, err)

def run(cmd, stdin=None, quiet=False, echo=False) -> RunOutput:
    loop = asyncio.get_event_loop()
    result = loop.run_until_complete(
        _stream_subprocess(cmd, stdin=stdin, quiet=quiet, echo=echo)
    )

    return result

from https://stackoverflow.com/a/59041913

ron003 commented 3 years ago

I don't think we should worry about tee. It's in the coreutils rpm (along with other utilities used: cat, date, dirname, mv, readlink, and wc). And tee does such a focused thing. For the specific cmake/ninja issue, I think we should break the issue into 3 pieces (pty-filter-tee):

  1. pty - get cmake/ninja to output color - use python
  2. From #1 above, we have color but unfortunately '\r' comes along with color! We need to filter, but since '\r' is not '\n' sed might not work depending upon whether or not you want all the info. We probably need awk or python script.
  3. save the resulting output with tee

I think the next focus should be on the filter. When cmake/ninja is doing color, it is also doing '\r' thing and we should decide exactly what filtering we want to do. For sure emitting lines as they are ready. To test how a filter behaves with buffering (i.e timing), '\r' and '\n' I do the following:

(sleep 3;echo `date +%s`;\
 sleep 3;echo `date +%s`;\
 sleep 3;echo -ne "`date +%s`\r";\
 sleep 3;echo -ne "`date +%s`\r";\
 sleep 3;echo -e  "\n`date +%s`";\
 sleep 3;echo `date +%s`;\
 sleep 3) | sed -e 's/\r/\n/g'

Replace sed with other filters. We should be able to run without filtering and see color first -- teeing to a log file. Then run the filter on the log file to see color and more lines.

alessandrothea commented 3 years ago

After a bit of tinkering, the easiest option appears to be the pexpect python module, which provides both the terminal emulation and filtering capabilities in a very compact form. Will post an example soon.

alessandrothea commented 3 years ago

Here is a truly minimal example using pexpect

#!/usr/bin/env python

import pexpect

cmd='ninja'
cmd='cmake --build .   -j 56 --verbose '

fout = open('mylog.txt','w')

child = pexpect.spawn(cmd)

while True:
    index = child.expect ([
        '\r\n',
        '\r',
        pexpect.EOF,
        pexpect.TIMEOUT
    ])

    # print(index)
    if index == 0:
        text=child.before.decode('utf-8')
        print(text)
        print(text, file=fout)

    elif index == 1:
        if (child.before):
            text=child.before.decode('utf-8')
            print(text)
            print(text, file=fout)
    else:
        break

It doesn't consume stdin. Running

pty_pexpect.py
echo hi;
echo there

as a script (/test_py_expect.sh) results in:

Screenshot 2021-03-08 at 09 25 54
ron003 commented 3 years ago

I've been busy with other things -- I've worked on this a little and am still looking at it... The script seems to do what we want in the least amount of lines and is straight foward and understandable. The issues I have are:

  1. it does things that are usually (according to the Unix Philosophy) deligated to classic filters: awk and tee. -- awk and tee are part of the one Unix standard and will be on every Linux distribution, except for the explicitly stripped down (e.g. busybox) distributions. This doesn't need to be done. Python can easily be used for the filter; the point here is to use a (commonly available) filter do filtering.
  2. the use of "click" and "pexpect", as your aware, required our python distribution to include them -- my distribution doesn't -- this may not be "nice" in some edge cases.
  3. I'm in the middle of doing the "pip" to install pexpect into the dunedaq v2.3 environment to verify, but with my Fedora 32 env, it seems that the script currently can't have its stdout redirected:
    /home/ron/work/DUNEPrj
    ron@ronlap77 :^) pytee.py echo hi | tail
    Traceback (most recent call last):
    File "./pytee.py", line 69, in <module>
    run()
    File "/usr/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
    File "/usr/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
    File "/usr/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
    File "/usr/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
    File "./pytee.py", line 30, in run
    cols, rows = os.get_terminal_size()
    OSError: [Errno 25] Inappropriate ioctl for device
    --2021-03-09_09:58:15_CST--

    Pengfei and I have verified that one will not be able to do something like dbt-build.sh --clean --install | tee myout OR dbt-build.sh --clean -install | less This is an example of the kind of thing that can happen as a result of # 1.

The "pty issue" and be either handled by a python script that fits into a shell variable or a pypty.py script and the same is true for the "filter issue."

ron003 commented 3 years ago

I would rather just be able to push a branch instead of forking.

alessandrothea commented 3 years ago

Hi @ron003,

Here some answers:

  1. I agree in general, and that has been the strategy op to the point where we agreed to go for python to trick ninja into believing it runs in a terminal. Once python is in the game, it makes more sense to stick to one tool (python) to run and filter.
  2. Well, pytee.py (suggestions for a better names are welcome) is not meant to be generic, but exists in support of the daq-buildtools where we control the python environment. click is already provided, pexpect will be added in the next release.
  3. Good point. Fixed in commit 626ae71

Cheers,

Alessandro