Open asmacdo opened 3 weeks ago
@yarikoptic thoughts?
Heres the best integration so far. I tried to take this to discovery to integrate with slurm also, but they are mid-migration this week (the interactive nodes I'd been using are down.) probably best to wait for the new setup.
datalad run -m "datalad+duct+sing+mriqc" \
duct -p Z_ \
singularity run --contain \
--bind /home/asmacdo/devel/sandbox/mriqc-sanity/sourcedata:/data:ro \
--bind /home/asmacdo/devel/sandbox/mriqc-sanity:/out \
--bind /home/asmacdo/devel/sandbox/mriqc-sanity/workdir:/workdir \
docker://nipreps/mriqc:latest
/data /out participant --participant-label 02 --no-sub -w /workdir
Wall Clock Time: 293.760 sec
Memory Peak Usage (RSS): 33746944 bytes
Memory Average Usage (RSS): 23660565.557894744 bytes
Virtual Memory Peak Usage (VSZ): 1679130624 bytes
Virtual Memory Average Usage (VSZ): 1278351482.161403 bytes Memory Peak Percentage: 0.5% Memory Average Percentage: 0.04140350877192981%
CPU Peak Usage: 12.2%
Average CPU Usage: 6.457543859649121%
Samples Collected: 285
Reports Written: 5
Option 2 datalad wraps duct
This option (used on the datalad blog) is probably the best choice, but unfortunately it cant work with containers-run without either altering containers-run or the container its running.
Just add duct to the actual invocation in .datalad/config
-- there is no need to modify container or containers-run since it does not hardcode any invocation. See e.g. https://github.com/ReproNim/containers/blob/master/.datalad/config where we explicitly specify
cmdexec = {img_dspath}/scripts/singularity_cmd run {img} {cmd}
for all but a test container. So here you would change to
cmdexec = duct {img_dspath}/scripts/singularity_cmd run {img} {cmd}
and datalad save
.
Option 3 datalad wraps container wraps duct
yeap, that is what I also thought about -- should be doable and provide a solution for docker style container platforms . But may be we find some other / better ways. For now let's pretty much concentrate on "2" .
In the longer run I think we should just augment datalad run
with a "custom runner" feature which would allow for making duct
invoked via a config option as a wrapper for any execution (and thus a datalad container-run) which would seemingly work for any singularity use case.
For repronim/containers we could even just add handling via making that scripts/singularity_cmd
to respect some REPRONIM_CONTAINERS_RUNNER
env var, to which we feed the duct
invocation which gets prepending before actual singularity $cmd
invocation, as e.g. ${REPRONIM_CONTAINERS_RUNNER:-} singularity $cmd ...
.
Something isnt working as expected with datalad containers-run
integration. (Initial guess is something happening in ///repronim/containers/scripts/singularity_cmd
)
(side note, "pwd": "."
seems unhelpful. Is it always a relative path?)
commit acf92a7adb7ff1d5d1bd1e1869225025c76fc5a3
Author: Austin <austin@dartmouth.edu>
Date: Fri Aug 30 11:46:36 2024 -0400
[DATALAD RUNCMD] duct -p cr-output/duct_ ./code/container...
=== Do not change lines below ===
{
"chain": [],
"cmd": "duct -p cr-output/duct_ ./code/containers/scripts/singularity_cmd run code/containers/images/bids/bids-mriqc--0.16.0.sing '{inputs}' '{outputs}' participant group -w cr-output/workdir -v",
"dsid": "3a04f70a-7714-424c-b6fd-d179cfba9658",
"exit": 0,
"extra_inputs": [
"code/containers/images/bids/bids-mriqc--0.16.0.sing"
],
"inputs": [
"sourcedata"
],
"outputs": [
"cr-output"
],
"pwd": "."
}
^^^ Do not change lines above ^^^
{
"exit_code": 0,
"wall_clock_time": 396.50900530815125,
"peak_rss": 3977216,
"average_rss": 3977194.61096606,
"peak_vsz": 7761920,
"average_vsz": 7761920,
"peak_pmem": 0,
"average_pmem": 0,
"peak_pcpu": 14.6,
"average_pcpu": 5.629242819843339
}
:warning: average and peak memory usage are the same, and very low.
commit 5d7c94223b2e768ee705da177aa8c10be8e1968b (HEAD -> master)
Author: Austin <austin@dartmouth.edu>
Date: Fri Aug 30 11:56:12 2024 -0400
[DATALAD RUNCMD] Whole run, duct+sing direct
=== Do not change lines below ===
{
"chain": [],
"cmd": "duct -p sing_direct/duct_ singularity run --contain --bind /home/asmacdo/devel/sandbox/mriqc-sanity/sourcedata:/data:ro --bind /home/asmacdo/devel/sandbox/mriqc-sanity/sing_direct:/out --bind /home/asmacdo/devel/sandbox/mriqc-sanity/sing_direct/workdir:/workdir docker://nipreps/mriqc:latest /data /out participant group --no-sub -w /workdir",
"dsid": "3a04f70a-7714-424c-b6fd-d179cfba9658",
"exit": 0,
"extra_inputs": [],
"inputs": [],
"outputs": [],
"pwd": "."
}
^^^ Do not change lines above ^^^
{
"exit_code": 0,
"wall_clock_time": 370.62019300460815,
"peak_rss": 40427520,
"average_rss": 24678171.810584974,
"peak_vsz": 1603104768,
"average_vsz": 1274291887.4206126,
"peak_pmem": 0.5,
"average_pmem": 0.03899721448467966,
"peak_pcpu": 12.3,
"average_pcpu": 6.814763231197769
}
Here we see more reasonable values for memory usage. Both runs saved to different output dirs on the master branch for easy comparison.
asmacdo@typhon: /home/asmacdo/devel/sandbox/mriqc-sanity
(side note,
"pwd": "."
seems unhelpful. Is it always a relative path?)
IIRC it is from the top of the dataset, happen you run command in a subfolder -- you would find it super helpful ;)
So check the stats logs -- likely it didn't track kid processes correctly, so rerun and see what is up with the session id etc, or may be even run with higher log level for duct (via env var?) to see what it sees while running
Goal: Integrate with datalad and singularity (and SLURM) to execute MRIQC
Option 1 duct wraps datalad
duct datalad containers-run ...
This doesn't work as one would hope, because prior to execution, duct creates new files which means datalad starts from a dirty state and fails. We can get around this with
datalad run --explicit
, but it doesn't quite work.Which seems like its working... but not really. Datalad commits the duct files prior to duct's exit.
(Another workaround is to pass an output prefix outside of the dataset, but thats not great, since we would want that data committed.)
Option 2 datalad wraps duct
This option (used on the datalad blog) is probably the best choice, but unfortunately it cant work with containers-run without either altering containers-run or the container its running.
It does work with
datalad run
by simply wrapping the command to execute the container. But this loses all the benefits of native datalad container integration.Option 3 datalad wraps container wraps duct
We could also try to set up a clever container with duct bind-mounted in, and override the entrypoint to wrap duct around it. This option feels complex and will be opaque to users. However this pattern would work with OCI containers, all other options are intended for apptainer/singularity.f