con / duct

A helper to run a command, capture stdout/stderr and details about running
https://pypi.org/project/con-duct/
MIT License
1 stars 2 forks source link

datalad containers-run integration: wrap battle #170

Open asmacdo opened 3 weeks ago

asmacdo commented 3 weeks ago

Goal: Integrate with datalad and singularity (and SLURM) to execute MRIQC

Option 1 duct wraps datalad

duct datalad containers-run ...

This doesn't work as one would hope, because prior to execution, duct creates new files which means datalad starts from a dirty state and fails. We can get around this with datalad run --explicit, but it doesn't quite work.

$ duct datalad containers-run \
        --explicit \
        -n bids-mriqc \
        --input sourcedata \
        --output . \
        '{inputs}' '{outputs}' participant --participant-label 02 -w workdir
2024-08-28T14:58:19-0500 [INFO    ] con-duct: duct is executing 'datalad containers-run --explicit -n bids-mriqc --input sourcedata --output . {inputs} {outputs} participant --participant-label 02 -w workdir'...
2024-08-28T14:58:19-0500 [INFO    ] con-duct: Log files will be written to .duct/logs/2024.08.28T14.58.19-74198_
[INFO] Making sure inputs are available (this may take some time) 
[INFO] == Command start (output follows) ===== 
240828-19:58:21,778 cli IMPORTANT:

    Running MRIQC version 0.16.0:
      * BIDS dataset path: /home/austin/datasets/ds000003-qc/sourcedata.
      * Output folder: /home/austin/datasets/ds000003-qc.
      * Analysis levels: ['participant'].

You are using MRIQC v0.16.0, and a newer version is available: 24.0.2.
240828-19:58:46,465 cli WARNING:
     IMPORTANT: Anonymized quality metrics (IQMs) will be submitted to MRIQC's metrics repository. Submission of IQMs can be disabled using the ``--no-sub`` argument. Please visit https://mriqc.readthedocs.io/en/latest/dsa.html to revise MRIQC's Data Sharing Agreement.
240828-19:59:07,13 cli WARNING:
     Building bold report: no exclude index was found
240828-19:59:18,606 cli WARNING:
     IMPORTANT: Anonymized quality metrics (IQMs) will be submitted to MRIQC's metrics repository. Submission of IQMs can be disabled using the ``--no-sub`` argument. Please visit https://mriqc.readthedocs.io/en/latest/dsa.html to revise MRIQC's Data Sharing Agreement.
240828-19:59:18,607 cli IMPORTANT:
     Participant level finished successfully.
run(ok): /home/austin/datasets/ds000003-qc (dataset) [./code/containers/scripts/singularity_cm...]
[INFO] == Command exit (modification check follows) ===== 
add(ok): .duct/logs/2024.08.28T14.58.19-74198_info.json (file)
add(ok): .duct/logs/2024.08.28T14.58.19-74198_stderr (file)
add(ok): .duct/logs/2024.08.28T14.58.19-74198_stdout (file)
add(ok): .duct/logs/2024.08.28T14.58.19-74198_usage.json (file)
add(ok): sub-02_T1w.html (file)
add(ok): sub-02_task-rhymejudgment_bold.html (file)
save(ok): . (dataset)
action summary:
  add (ok: 6)
  get (notneeded: 3)
  run (ok: 1)
  save (notneeded: 2, ok: 1)
2024-08-28T14:59:20-0500 [INFO    ] con-duct: Summary:
Exit Code: 0
Command: datalad containers-run --explicit -n bids-mriqc --input sourcedata --output . {inputs} {outputs} participant --participant-label 02 -w workdir
Log files location: .duct/logs/2024.08.28T14.58.19-74198_
Wall Clock Time: 61.161 sec
Memory Peak Usage (RSS): 52518912 bytes
Memory Average Usage (RSS): 51502163.93442623 bytes
Virtual Memory Peak Usage (VSZ): 586903552 bytes
Virtual Memory Average Usage (VSZ): 508344957.9016393 bytes
Memory Peak Percentage: 0.1%
Memory Average Percentage: 0.09836065573770492%
CPU Peak Usage: 28.0%
Average CPU Usage: 2.157377049180328%
Samples Collected: 61
Reports Written: 2

Which seems like its working... but not really. Datalad commits the duct files prior to duct's exit.

$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   .duct/logs/2024.08.28T14.58.19-74198_info.json
    modified:   .duct/logs/2024.08.28T14.58.19-74198_stdout
    modified:   .duct/logs/2024.08.28T14.58.19-74198_usage.json

(Another workaround is to pass an output prefix outside of the dataset, but thats not great, since we would want that data committed.)

Option 2 datalad wraps duct

This option (used on the datalad blog) is probably the best choice, but unfortunately it cant work with containers-run without either altering containers-run or the container its running.

It does work with datalad run by simply wrapping the command to execute the container. But this loses all the benefits of native datalad container integration.

Option 3 datalad wraps container wraps duct

We could also try to set up a clever container with duct bind-mounted in, and override the entrypoint to wrap duct around it. This option feels complex and will be opaque to users. However this pattern would work with OCI containers, all other options are intended for apptainer/singularity.f

asmacdo commented 3 weeks ago

@yarikoptic thoughts?

asmacdo commented 3 weeks ago

Heres the best integration so far. I tried to take this to discovery to integrate with slurm also, but they are mid-migration this week (the interactive nodes I'd been using are down.) probably best to wait for the new setup.

datalad run -m "datalad+duct+sing+mriqc" \
    duct -p Z_ \
    singularity run --contain \
    --bind /home/asmacdo/devel/sandbox/mriqc-sanity/sourcedata:/data:ro \
    --bind /home/asmacdo/devel/sandbox/mriqc-sanity:/out \
    --bind /home/asmacdo/devel/sandbox/mriqc-sanity/workdir:/workdir \
    docker://nipreps/mriqc:latest 
    /data /out participant --participant-label 02 --no-sub -w /workdir
    Wall Clock Time: 293.760 sec
    Memory Peak Usage (RSS): 33746944 bytes
    Memory Average Usage (RSS): 23660565.557894744 bytes
    Virtual Memory Peak Usage (VSZ): 1679130624 bytes
    Virtual Memory Average Usage (VSZ): 1278351482.161403 bytes                                                                                Memory Peak Percentage: 0.5%                                                                                                               Memory Average Percentage: 0.04140350877192981%
    CPU Peak Usage: 12.2%
    Average CPU Usage: 6.457543859649121%
    Samples Collected: 285
    Reports Written: 5
yarikoptic commented 3 weeks ago

Option 2 datalad wraps duct

This option (used on the datalad blog) is probably the best choice, but unfortunately it cant work with containers-run without either altering containers-run or the container its running.

Just add duct to the actual invocation in .datalad/config -- there is no need to modify container or containers-run since it does not hardcode any invocation. See e.g. https://github.com/ReproNim/containers/blob/master/.datalad/config where we explicitly specify

cmdexec = {img_dspath}/scripts/singularity_cmd run {img} {cmd}

for all but a test container. So here you would change to

cmdexec = duct {img_dspath}/scripts/singularity_cmd run {img} {cmd}

and datalad save.

Option 3 datalad wraps container wraps duct

yeap, that is what I also thought about -- should be doable and provide a solution for docker style container platforms . But may be we find some other / better ways. For now let's pretty much concentrate on "2" .

In the longer run I think we should just augment datalad run with a "custom runner" feature which would allow for making duct invoked via a config option as a wrapper for any execution (and thus a datalad container-run) which would seemingly work for any singularity use case.

For repronim/containers we could even just add handling via making that scripts/singularity_cmd to respect some REPRONIM_CONTAINERS_RUNNER env var, to which we feed the duct invocation which gets prepending before actual singularity $cmd invocation, as e.g. ${REPRONIM_CONTAINERS_RUNNER:-} singularity $cmd ....

asmacdo commented 3 weeks ago

Something isnt working as expected with datalad containers-run integration. (Initial guess is something happening in ///repronim/containers/scripts/singularity_cmd)

(side note, "pwd": "." seems unhelpful. Is it always a relative path?)

datalad containers-run

commit acf92a7adb7ff1d5d1bd1e1869225025c76fc5a3
Author: Austin <austin@dartmouth.edu>
Date:   Fri Aug 30 11:46:36 2024 -0400

    [DATALAD RUNCMD] duct -p cr-output/duct_ ./code/container...

    === Do not change lines below ===
    {
     "chain": [],
     "cmd": "duct -p cr-output/duct_ ./code/containers/scripts/singularity_cmd run code/containers/images/bids/bids-mriqc--0.16.0.sing '{inputs}' '{outputs}' participant group -w cr-output/workdir -v",
     "dsid": "3a04f70a-7714-424c-b6fd-d179cfba9658",
     "exit": 0,
     "extra_inputs": [
      "code/containers/images/bids/bids-mriqc--0.16.0.sing"
     ],
     "inputs": [
      "sourcedata"
     ],
     "outputs": [
      "cr-output"
     ],
     "pwd": "."
    }
    ^^^ Do not change lines above ^^^
{
  "exit_code": 0,
  "wall_clock_time": 396.50900530815125,
  "peak_rss": 3977216,
  "average_rss": 3977194.61096606,
  "peak_vsz": 7761920,
  "average_vsz": 7761920,
  "peak_pmem": 0,
  "average_pmem": 0,
  "peak_pcpu": 14.6,
  "average_pcpu": 5.629242819843339
}

:warning: average and peak memory usage are the same, and very low.

datalad run duct+sing

commit 5d7c94223b2e768ee705da177aa8c10be8e1968b (HEAD -> master)
Author: Austin <austin@dartmouth.edu>
Date:   Fri Aug 30 11:56:12 2024 -0400

    [DATALAD RUNCMD] Whole run, duct+sing direct

    === Do not change lines below ===
    {
     "chain": [],
     "cmd": "duct -p sing_direct/duct_ singularity run --contain --bind /home/asmacdo/devel/sandbox/mriqc-sanity/sourcedata:/data:ro --bind /home/asmacdo/devel/sandbox/mriqc-sanity/sing_direct:/out --bind /home/asmacdo/devel/sandbox/mriqc-sanity/sing_direct/workdir:/workdir docker://nipreps/mriqc:latest /data /out participant group --no-sub -w /workdir",
     "dsid": "3a04f70a-7714-424c-b6fd-d179cfba9658",
     "exit": 0,
     "extra_inputs": [],
     "inputs": [],
     "outputs": [],
     "pwd": "."
    }
    ^^^ Do not change lines above ^^^
{
  "exit_code": 0,
  "wall_clock_time": 370.62019300460815,
  "peak_rss": 40427520,
  "average_rss": 24678171.810584974,
  "peak_vsz": 1603104768,
  "average_vsz": 1274291887.4206126,
  "peak_pmem": 0.5,
  "average_pmem": 0.03899721448467966,
  "peak_pcpu": 12.3,
  "average_pcpu": 6.814763231197769
}

Here we see more reasonable values for memory usage. Both runs saved to different output dirs on the master branch for easy comparison. asmacdo@typhon: /home/asmacdo/devel/sandbox/mriqc-sanity

yarikoptic commented 2 weeks ago

(side note, "pwd": "." seems unhelpful. Is it always a relative path?)

IIRC it is from the top of the dataset, happen you run command in a subfolder -- you would find it super helpful ;)

So check the stats logs -- likely it didn't track kid processes correctly, so rerun and see what is up with the session id etc, or may be even run with higher log level for duct (via env var?) to see what it sees while running