Open m-petersen opened 3 years ago
Never encountered such need. Could you please elaborate a bit - what commands you run to "unpack"? Any sample public singularity image (if image type specific)?
Hi Yaroslav,
of course.
The command I call (being located in the root of a subject specific ephemeral clone) is
datalad containers-run \ -m "$PIPE_ID" \ --explicit \ --output $CLONE_DATA_DIR/freesurfer -o $CLONE_DATA_DIR/smriprep -o $CLONE_DATA_DIR/smriprep/$1 \ --input "$CLONE_BIDS_DIR/$1"\ --container-name smriprep \ data/raw_bids data participant \ -w .git/tmp/wdir \ --participant-label $1 \ --output-spaces fsnative fsaverage MNI152NLin6Asym T1w T2w \ --nthreads $SLURM_CPUS_PER_TASK \ --fs-subjects-dir data/freesurfer \ --stop-on-first-crash \ --fs-license-file envs/freesurfer_license.txt
The container installed with datalad containers-add is https://hub.docker.com/layers/nipreps/smriprep/0.8.0rc2/images/sha256-4b6669dbb82f8ee14837208472d31c3b842b432e3abd6fd7deea738b4f4dafd7?context=explore
The containers-add command is
datalad containers-add ${container%-*} \ --url $ENV_DIR/$container \ --dataset $PROJ_DIR \ --update \ --call-fmt "singularity run --cleanenv --userns --no-home -B . -B \$SCRATCH_DIR:/tmp {img} {cmd}" done
Environment variables used here:
PROJ_DIR=. (superdataset root) ENV_DIR=./envs SCRATCH_DIR=/scratch (scratch partition on our HPC) CLONE_DATA_DIR is a directory of an ephemeral clone containing input and result subdatasets (my workflow follows http://handbook.datalad.org/en/latest/beyond_basics/101-171-enki.html)
During test runs on a single subject the container works well when unpacked, but until then ~15 minutes pass. When I run multiple subjects on one node in parallel converting the containers does take hours rendering the whole computation effectively sequential. That's why I want to avoid unpacking for every subject run.
We are aiming for computationally optimized processing of imaging data on our HPC using datalad for provenance tracking and collaboration.
My scripts are
pipelines_submission.txt Submits pipelines_parallelization for a subject batch for a defined pipeline
pipelines_parallelization.txt Shall parallelize execution of pipelines_processing across subject batch on a node. During tests it did that but now I'm in doubt because singularity container conversions interfere. Correct me if I'm wrong but interference occurs when multiple jobs try to unpack the same container. If I produce an ephemeral clone per subject (as I do in pipelines_processing) shouldn't every process unpack it's own container since the installed container (datalad containers-add) should be transferred to the clone?
pipelines_processing.txt Sets up an ephemeral clone and calls pipeline script like smriprep
smriprep.txt Script containing the pipeline specific command including the container execution with datalad containers-run. We also try to implement other preconfigured pipelines like fmriprep, qsiprep etc.
Hope that clarifies a bit what my problem is and what I am trying to achieve. Using datalad for all these things is a little bit overwhelming for me at the moment.
Thanks a lot in advance.
Regards, Marvin
I see -- so it is conversion of container from docker to singularity upon each run.
Ideally you/we should just have a singular converted singularity container for that app, so no conversion would be needed. that is what we also try to facilitate with https://github.com/ReproNim/containers/ but smriprep isn't there yet (filed an issue ref'ed above within bids-apps).
What is the value of ${container%-*}
you have? I thought that if docker://
we would do such conversion once while adding that singularity container from docker into local dataset, so then it could be reused across jobs, and there will be no other repacking of anything.
so I am still not 100% sure what conversion in relation to singularity we are dealing with. May be you have some output/logging which shows ?
During test runs on a single subject the container works well when unpacked, but until then ~15 minutes pass.
so are you talking about those 15 minutes as the time of concern? My guess it is datalad
/git trying to figure out what has changed to make a commit.
I think there is a misunderstanding.
I install the singularity container from a local datalad dataset containing predownloaded singularity containers (from dockerhub) called ENV_DIR before executing a SLURM submission. I do not download any singularity containers from dockerhub on the fly during the respective job. It's the conversion of the singularity container image to a sandbox that is enforced on our HPC (I think because of incompatibility of the file system with singularity) that takes a long time or is practically impossible when done multiple times in parallel.
What is the value of ${container%-} you have? The containers are named after a scheme of
- }` they are named by.sif in ENV_DIR. With `${container%- since dots and hyphens aren't allowed in container names if I remember right.
The workaround I am now establishing is to execute singularity run
wrapped in datalad run
with a sandbox that has been converted in advance. And I was wondering whether there is a way of using datalad containers-run
assuming there are some benefits for instance with regard to provenance tracking etc. compared to the simple datalad run
.
sorry we forgot about this issue discussion.
It's the conversion of the singularity container image to a sandbox that is enforced on our HPC
still not clear to me (please point/cite specific lines) on what exactly such conversion entails? My only guess: copy singularity container from a partition where they cannot be executed from (e.g. /home
) to another partition which supports executing them (e.g. /sandbox
). If that is so, and /sandbox
is mounted across all the nodes, the easiest solution probably would be:
/sandbox
; /sandbox
in wreckless mode (symlinking .git/annex/objects
) into resultant dataset before execution; containers-run
*And I was wondering whether there is a way of using
datalad containers-run
assuming there are some benefits for instance with regard to provenance tracking etc. compared to the simpledatalad run
.
well, containers-run
is a lean wrapper around regular run
. The only specific aspect which comes to mind, it passes container image not within inputs
but extra_inputs
which has a little different semantic. ref: https://github.com/datalad/datalad-container/blob/master/datalad_container/containers_run.py#L137
Thanks a lot for your reply.
By now we have established another solution foregoing datalad.
What I mean by a sandbox is the conversion of the singularity container image to a writable directory (https://sylabs.io/guides/3.0/user-guide/build_a_container.html#creating-writable-sandbox-directories). Automated conversion to of the containers to sandbox directories before using them is enforced on our HPC and due to its very slow filesystem this process takes forever hampering computations.
FWIW, as the command to actually execute the container is fully configurable in .datalad/config
I guess one solution could be to develop the shim which would convert to sandbox if it was not yet done. Could then even be local to the system/compute-node if so desired. An example of such "shimming" is https://github.com/ReproNim/containers/blob/master/scripts/singularity_cmd which takes care about "thorough" sanitization and also suppor of running singularity via docker if on OSX. Then https://github.com/ReproNim/containers/blob/master/.datalad/config refers to it instead of plain singularity run
command.
Hi,
our HPC enforces unpacking singularity containers to sandboxes which takes a really long time if done multiple times in parallel. One way to circumvent unpacking all the time would be to use a preconverted sandboxes. Is there a way to use datalad containers with a sandbox?
Thanks!