con / duct

A helper to run a command, capture stdout/stderr and details about running
https://pypi.org/project/con-duct/
MIT License
3 stars 2 forks source link

incorrect `cmd` recorded -- includes the STAT? #178

Closed yarikoptic closed 1 month ago

yarikoptic commented 2 months ago

Trying with @candleindark

❯ pip install con-duct
Collecting con-duct
  Downloading con_duct-0.3.0-py3-none-any.whl.metadata (7.0 kB)
Downloading con_duct-0.3.0-py3-none-any.whl (13 kB)
Installing collected packages: con-duct
Successfully installed con-duct-0.3.0
❯ datalad run -m "Generate current state of things" duct dandisets-linkml-status --include-unpublished --dandi-instance dandi
[INFO   ] == Command start (output follows) ===== 
2024-09-13T12:42:05-0400 [INFO    ] con-duct: duct is executing 'dandisets-linkml-status --include-unpublished --dandi-instance dandi'...
2024-09-13T12:42:05-0400 [INFO    ] con-duct: Log files will be written to .duct/logs/2024.09.13T12.42.05-450930_

and we were looking at the usage logs

❯ tail -f .duct/logs/2024.09.13T12.42.05-450930_usage.json
{"timestamp": "2024-09-13T12:42:05.207290-04:00", "num_samples": 1, "processes": {"450934": {"pcpu": 0.0, "pmem": 0.0, "rss": 11280384, "vsz": 19775488, "timestamp": "2024-09-13T12:42:05.207290-04:00", "etime": "00:00", "cmd": "Rs   /home/yoh/proj/dandi/dandisets-linkml-status/venv/3.12/bin/python3.12 /home/yoh/proj/dandi/dandisets-linkml-status/venv/3.12/bin/dandisets-linkml-status --include-unpublished --dandi-instance dandi"}}, "totals": {"pmem": 0.0, "pcpu": 0.0, "rss": 11280384, "vsz": 19775488}, "averages": {"rss": 11280384, "vsz": 19775488, "pmem": 0.0, "pcpu": 0.0, "num_samples": 1}}
{"timestamp": "2024-09-13T12:42:06.265415-04:00", "num_samples": 57, "processes": {"450934": {"pcpu": 103.0, "pmem": 0.0, "rss": 64942080, "vsz": 76558336, "timestamp": "2024-09-13T12:42:06.265415-04:00", "etime": "00:01", "cmd": "Rs   /home/yoh/proj/dandi/dandisets-linkml-status/venv/3.12/bin/python3.12 /home/yoh/proj/dandi/dandisets-linkml-status/venv/3.12/bin/dandisets-linkml-status --include-unpublished --dandi-instance dandi"}}, "totals": {"pmem": 0.0, "pcpu": 103.0, "rss": 64942080, "vsz": 76558336}, "averages": {"rss": 264286064.28070173, "vsz": 278001412.491228, "pmem": 0.34385964912280703, "pcpu": 77.37368421052629, "num_samples": 57}}
{"timestamp": "2024-09-13T12:43:06.874113-04:00", "num_samples": 56, "processes": {"450934": {"pcpu": 78.5, "pmem": 0.6, "rss": 436379648, "vsz": 450220032, "timestamp": "2024-09-13T12:43:06.874113-04:00", "etime": "01:01", "cmd": "Rs   /home/yoh/proj/dandi/dandisets-linkml-status/venv/3.12/bin/python3.12 /home/yoh/proj/dandi/dandisets-linkml-status/venv/3.12/bin/dandisets-linkml-status --include-unpublished --dandi-instance dandi"}}, "totals": {"pmem": 0.6, "pcpu": 78.5, "rss": 436379648, "vsz": 450220032}, "averages": {"rss": 598127616.0000001, "vsz": 634938294.8571428, "pmem": 0.8392857142857143, "pcpu": 78.54285714285716, "num_samples": 56}}
^C

and I spotted that it is

"cmd": "Rs   /home/yoh/proj/dandi/dandisets-linkml-status/venv/3.12/bin/python3.12 /home/yoh/proj/dandi/dandisets-linkml-status/venv/3.12/bin/dandisets-linkml-status --include-unpublished --dandi-instance dandi"

so includes Rs for some reason, likely from STAT

❯ ps auxw -H | head -n 1
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
❯ ps auxw -H | grep /home/yoh/proj/dandi/dandisets-linkml-status/venv/3.12/bin/python3.1[2]
yoh       450930  1.6  0.0 317980 15344 pts/10   Sl+  12:42   0:02             /home/yoh/proj/dandi/dandisets-linkml-status/venv/3.12/bin/python3.12 /home/yoh/proj/dandi/dandisets-linkml-status/venv/3.12/bin/duct dandisets-linkml-status --include-unpublished --dandi-instance dandi
yoh       450934 79.5  1.4 1285316 976944 ?      Rsl  12:42   2:13               /home/yoh/proj/dandi/dandisets-linkml-status/venv/3.12/bin/python3.12 /home/yoh/proj/dandi/dandisets-linkml-status/venv/3.12/bin/dandisets-linkml-status --include-unpublished --dandi-instance dandi
yoh       451601  0.0  0.0   6572  2368 pts/16   S+   12:44   0:00         grep --color=auto -d skip /home/yoh/proj/dandi/dandisets-linkml-status/venv/3.12/bin/python3.12

so I guess parsing is buggy ATM, at least on my laptop.

FWIW I ran tests on that version and they passed

❯ python -m pytest
=============================================================================== test session starts ================================================================================
platform linux -- Python 3.12.5, pytest-8.3.2, pluggy-1.5.0
rootdir: /home/yoh/proj/CON/duct
configfile: tox.ini
plugins: mock-3.14.0, cov-5.0.0, xdist-3.6.1
collected 103 items                                                                                                                                                                

test/test_arg_parsing.py ......                                                                                                                                              [  5%]
test/test_execution.py ...............                                                                                                                                       [ 20%]
test/test_log_paths.py ..............                                                                                                                                        [ 33%]
test/test_prepare_outputs.py ................                                                                                                                                [ 49%]
test/test_report.py ........................                                                                                                                                 [ 72%]
test/test_tailpipe.py ...............                                                                                                                                        [ 87%]
test/test_validation.py .............                                                                                                                                        [100%]

---------- coverage: platform linux, python 3.12.5-final-0 -----------
Name                       Stmts   Miss Branch BrPart   Cover   Missing
-----------------------------------------------------------------------
src/con_duct/__init__.py       2      0      0      0 100.00%
src/con_duct/__main__.py     471     17    136     13  94.73%   136-144, 360, 377->exit, 427->426, 674, 711-712, 729-730, 841->840, 852-854, 869, 877, 883->889, 894->899, 899->904, 904->908, 914
-----------------------------------------------------------------------
TOTAL                        473     17    136     13  94.75%

=============================================================================== 103 passed in 27.57s ===============================================================================
python -m pytest  23.86s user 12.35s system 126% cpu 28.513 total

so must be particular about this invocation or testing is not yet strong enough for parsing of cmd out

asmacdo commented 2 months ago

This is a bug-- I accidentally left stat in ps but brought maxsplit back down to 6.

                    "pid,pcpu,pmem,rss,vsz,etime,stat,cmd",
                ],
                text=True,
            )
            for line in output.splitlines()[1:]:
                if line:
                    pid, pcpu, pmem, rss_kib, vsz_kib, etime, cmd = line.split(
                        maxsplit=6,
                    )

https://github.com/con/duct/blob/main/src/con_duct/__main__.py#L422-L428

We have two options. We could add stat to the record, which might be a little odd since there are multiple measurements and the stat can change. In that case I think we would just the stat from the final measurement.

Or, if that information isn't particularly useful we could just cut it. I'm leaning towards keeping it. @yarikoptic wdyt

yarikoptic commented 2 months ago

actually not a bad idea to collect all stats, as pretty much "Counter" on them across samples (how often it was in S how often in some other mode). It can come handy!

while doing that -- strengthen the tests, do TDD -- first make a test which currently fails, then fix/extend code accordingly.

here is chatgpt example on such a Counter across ALL processes I have running ATM :

❯ ps -eo stat | tail -n +2 | python3 -c "import sys; from collections import Counter; print(Counter([line.strip() for line in sys.stdin]))"
Counter({'S': 219, 'Sl': 161, 'I<': 93, 'I': 76, 'Ssl': 66, 'Ss': 36, 'S+': 13, 'Ss+': 7, 'SLsl': 6, 'SLl': 6, 'SN': 5, 'SLs': 2, 'Sl+': 2, 'SNl': 2, 'R+': 2, 'S<': 1, 'Ssl+': 1, 'Rl+': 1, 'SNsl': 1, 'D<': 1})
asmacdo commented 1 month ago

Fixed by https://github.com/con/duct/pull/182