smon "redesign" - Githubissues

https://github.com/brainlife/abcd-spec/blob/master/hooks/smon is a great start for a desired script but we need to

[x] make it capable of simply starting the job instead of "attaching itself" to the same session, so make it into a "wrapper" which starts the command similarly to how was done in https://github.com/con/duct/blob/main/duct_time#L13
- e.g. duct ./abandoning_parent.sh 5 ./consume_mem.py 100 200
- it should use os.setsid() to establish a new session id
- should be more careful in assessing which process is itself -- just match by PID? https://github.com/brainlife/abcd-spec/blob/master/hooks/smon#L161
[x] start monitoring in a separate thread, while main thread would be "busy" executing the process ~- [ ] add to the json line records what type of record it is, e.g, "record_type": "system-summary", "processes-sample", "processes-summary"~
- [ ] "processes-summary" should just provide at the end the summary across all the processes on peak memory and consumed CPU resources (similar to what we got with https://github.com/con/duct/blob/main/duct_time ) + exit_code of the child process, "wall_run_time_sec"
[x] make it configurable
- [x] should use argparse (or any other built-in), but if user does not specify that particular option on command line -- should take from a corresponding environment variable. May be it would be as easy as providing default value as the one which comes from environment, e.g. default for os.getenv('DUCT_OUTPUT_PREFIX', ".duct/run-logs/{iso_date_ms}_{process_id}") . Here the default would be a "python f-string" and process would format it with a dict of known variables like that.
- [x] --output-prefix (DUCT_OUTPUT_PREFIX env var), defaulting to what we have in https://github.com/con/duct/blob/main/duct_time#L3C25-L3C40 - time based etc. Make it an f-string.
- [x] --sample-interval SECONDS, --record-interval SECONDS -- currently smon seems to do sampling every 2 seconds for 30 times, and selects maximum utilization in terms of CPU or memory among those samples, and then spits out the record per process. Here we want to express not in number of samples to aggregate but rather how frequently to aggregate. May be just add to the record "number_of_sample_aggregated" : INT
- [x] --record-types all,system-summary,processes-samples,processes-summary (DUCT_RECORD_TYPES) -- ','-separated list of which record type(s) to bother collecting. By default -- all
- [x] --capture-outputs all,none,stdout,stderr . For starters -- when we set it to capture, we do not show those to terminal. Making it none is simplest, but then you would not capture any stdout/stderr and files should not be generated. just point to the output files whenever it is captured, not subprocess.PIPE.
- [x] --outputs none,outputs,spinner,dotter. none -- capturing would produce no output, "outputs" -- we would have a thread monitoring those output files and reading from them and dumping to stdout stderr . Later might be done smarter way through direct Popen orchestration of execution similarly to how we do in datalad. spinner -- use some fancy to your liking spinning wheel in terminal (e.g. sequence of /-\- with \r to go back and stats on the process so far (time, max mem etc)) upon receiving some block of outputs. Dotter - similar to before but then use . instead of some fancy spinner with -r.
[x] Take on organization of python project as @jwodder had done for https://github.com/datalad/datalad-installer -- have a python script but in a proper python package (just also make it into a python module -- add __init__.py)

con / duct

smon "redesign" #3