Closed alexlenail closed 5 years ago
How did you submit jobs (using REST API) to the cromwell server? Did you specify the workflow options JSON (workflow_opts/singularity.json
) when you POST?
@leepc12 yes:
job submission:
curl -X POST --header "Accept: application/json" -v "0.0.0.0:8000/api/workflows/v1/batch" \
-F workflowSource=@../../encode3-pipelines/atac-seq-pipeline/atac.wdl \
-F workflowInputs=@atac_trial.json \
-F workflowOptions=@../../encode3-pipelines/atac-seq-pipeline/workflow_opts/singularity.json
singularity.json
:
{
"default_runtime_attributes" : {
"singularity_container" : "~/.singularity/atac-seq-pipeline-v1.1.7.simg"
}
}
@zfrenchee: Thanks. Please post your atac_trial.json
too. Also, please read item 11 on this doc. Data file directories must be defined in singularity.json
to be bound to singularity.
@leepc12 thanks for your reply.
Could you clarify the format for "singularity_bindpath"
? The docs say:
"singularity_bindpath" : "/your/,YOUR_OWN_DATA_DIR1,YOUR_OWN_DATA_DIR1,..."
atac_trial.json:
[
{
"atac.title" : "epigenomics/1_fastq/6_protocol_selection/diMN32/ALS-0BUU_diMN32_rep1",
"atac.description" : "",
"atac.pipeline_type" : "atac",
"atac.paired_end" : true,
"atac.genome_tsv" : "/pool/data/cromwell-aals/encode3-pipelines/genome/hg38.tsv",
"atac.fastqs_rep1_R1" : [ "/pool/data/globus/epigenomics/1_fastq/6_protocol_selection/diMN32/ALS-0BUU_diMN32_rep1_1.fastq" ],
"atac.fastqs_rep1_R2" : [ "/pool/data/globus/epigenomics/1_fastq/6_protocol_selection/diMN32/ALS-0BUU_diMN32_rep1_2.fastq" ],
"atac.multimapping" : 4,
"atac.auto_detect_adapter" : true,
"atac.smooth_win" : 73,
"atac.enable_idr" : true,
"atac.idr_thresh" : 0.05,
"atac.enable_xcor" : true
}
]
It's SINGULARITY_BINDPATH
(https://singularity.lbl.gov/docs-mount#specifying-bind-paths), which is a comma separated directories to be bound to the container. Use /pool/data
for your case.
{
...
"singularity_bindpath" : "/pool/data"
...
}
Adding the "singularity_bindpath"
doesn't fix the OSError
:
2019-03-18 17:15:15,010 cromwell-system-akka.dispatchers.backend-dispatcher-150 INFO - DispatchedConfigAsyncJobExecutionActor [UUID(6884f955)atac.trim_adapter:0:1]: executing: ls ~/.singularity/atac-seq-pipeline-v1.1.7.simg $(echo /pool/data | tr , ' ') 1>/dev/null && (sbatch \
--export=ALL \
-J cromwell_6884f955_trim_adapter \
-D /pool/data/cromwell-aals/cromwell-executions/atac/6884f955-006d-4211-b5c9-a084278c4691/call-trim_adapter/shard-0 \
-o /pool/data/cromwell-aals/cromwell-executions/atac/6884f955-006d-4211-b5c9-a084278c4691/call-trim_adapter/shard-0/execution/stdout \
-e /pool/data/cromwell-aals/cromwell-executions/atac/6884f955-006d-4211-b5c9-a084278c4691/call-trim_adapter/shard-0/execution/stderr \
-t 1440 \
-n 1 \
--ntasks-per-node=1 \
--cpus-per-task=2 \
--mem=12000 \
--wrap "chmod u+x /pool/data/cromwell-aals/cromwell-executions/atac/6884f955-006d-4211-b5c9-a084278c4691/call-trim_adapter/shard-0/execution/script && SINGULARITY_BINDPATH=$(echo /pool/data/cromwell-aals/cromwell-executions/atac/6884f955-006d-4211-b5c9-a084278c4691/call-trim_adapter/shard-0 | sed 's/cromwell-executions/\n/g' | head -n1),/pool/data singularity exec --home /pool/data/cromwell-aals/cromwell-executions/atac/6884f955-006d-4211-b5c9-a084278c4691/call-trim_adapter/shard-0 ~/.singularity/atac-seq-pipeline-v1.1.7.simg /pool/data/cromwell-aals/cromwell-executions/atac/6884f955-006d-4211-b5c9-a084278c4691/call-trim_adapter/shard-0/execution/script")
(note /pool/data
in SINGULARITY_BINDPATH
)
Gives same error as above.
workflow_opts/singularity.json
:
{
"default_runtime_attributes" : {
"singularity_container" : "~/.singularity/atac-seq-pipeline-v1.1.7.simg",
"singularity_bindpath" : "/pool/data"
}
}
I think I solved this by changing the slurm_singularity configuration
slurm_singularity {
actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
config {
script-epilogue = "sleep 30"
concurrent-job-limit = 32
runtime-attributes = """
Int cpu = 1
Int? time
Int? memory_mb
String singularity_container
String? singularity_bindpath
"""
submit = """
ls ${singularity_container} $(echo ${singularity_bindpath} | tr , ' ') 1>/dev/null && (sbatch \
--export=ALL \
-J ${job_name} \
-D ${cwd} \
-o ${out} \
-e ${err} \
${"-t " + time*60} \
-n 1 \
--ntasks-per-node=1 \
${"--cpus-per-task=" + cpu} \
${"--mem=" + memory_mb} \
--wrap "chmod u+x ${script} && SINGULARITY_BINDPATH=$(echo ${cwd} | sed 's/cromwell-executions/\n/g' | head -n1),${singularity_bindpath}:rw singularity exec --home ${cwd} ${singularity_container} ${script}")
"""
kill = "scancel ${job_id}"
check-alive = "squeue -j ${job_id}"
job-id-regex = "Submitted batch job (\\d+).*"
}
}
Specifically, in the command, see:
SINGULARITY_BINDPATH=$(echo ${cwd} | sed 's/cromwell-executions/\n/g' | head -n1),${singularity_bindpath}:rw
in particular, ${singularity_bindpath}:rw
It seems I needed to specify rw
on my bind mount. This might be appropriate to either include in backends/backend.conf
by default, or leave a comment.
(I may not have solved this, but the pipeline has been running for a while...)
I think your pipeline is just hanging for a while since SINGULARITY_BINDPATH=/pool/data:rw
is not a valid syntax. Did you pipeline pass the read_genome_tsv
task (check if status is Done
)? What is your singularity version?
$ singularity --version
If it's >= 3.1 check if singularity exec --help
has --writable-tmpfs
flag. If so, please edit the backend like the following.
Replace
singularity exec --home
with
singularity exec --writable-tmpfs --home
Thanks for your reply, @leepc12 !
You're right, it seems like the job was hanging with :rw
, but it seems to be hanging now as well with --writable-tmpfs
.
My slurm cluster also seems to be idling:
~ » sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
galaxy* up infinite 1 idle answer
But the status on my job is Running
:
{"status":"Running","id":"68b4ec12-9f48-4547-8a4e-3b66d267e0a5"}
I'm now trying with just --writable
instead of --writable-tmpfs
...
I think it's related to this issue. Can you try to mount /dev/shm
as rw
somehow on our container?
test script
$ touch /dev/shm/1
$ singularity exec ~/.singularity/atac-seq-pipeline-v1.1.7.simg ls -l /dev/shm/1
$ singularity exec ~/.singularity/atac-seq-pipeline-v1.1.7.simg touch /dev/shm/2
$ ls -l /dev/shm/2
Thanks for your response, @leepc12,
root@answer:~# touch /dev/shm/1
root@answer:~# singularity exec ~/.singularity/atac-seq-pipeline-v1.1.7.simg ls -l /dev/shm/1
ls: cannot access '/dev/shm/1': Too many levels of symbolic links
root@answer:~# singularity exec ~/.singularity/atac-seq-pipeline-v1.1.7.simg touch /dev/shm/2
touch: cannot touch '/dev/shm/2': Too many levels of symbolic links
root@answer:~# ls -l /dev/shm/2
ls: cannot access /dev/shm/2: No such file or directory
root@answer:~#
If you have super-user privilege on your system, try enabling overlay file system by editing singularity configuration file https://singularity.lbl.gov/docs-config#enable-overlay-boolean-defaultno?
If --writable
or --writable-tmpfs
doesn't work, there is no workaround that can be done on the pipeline side. Can you also try Conda method instead of singularity?
@leepc12 I'm deeply grateful for this support.
I think I need to reverse myself on what I said in my comment a few days ago (https://github.com/ENCODE-DCC/atac-seq-pipeline/issues/99#issuecomment-474113747)
It seems like adding "singularity_bindpath"
causes the OSError to go away but the pipeline to hang, without --writable
or --writable-tmpfs
now. So based on my most recent set of tests:
no "singularity_bindpath" |
OSError |
"singularity_bindpath" without --writable or --writable-tmpfs |
Hangs |
"singularity_bindpath" with --writable-tmpfs |
Hangs |
"singularity_bindpath" with --writable |
Hangs |
In my previous comment, I said
"singularity_bindpath" without --writable or --writable-tmpfs |
OSError |
But can no longer reproduce that.
If you have 3 replicates but defined NUM_CONCURRENT_TASK=2 then cromwell will hold bowtie2 for rep3 until rep1 and rep2 are done.
The job I'm trying to schedule has no replicates, see the whole JSON I'm submitting here: https://github.com/ENCODE-DCC/atac-seq-pipeline/issues/99#issuecomment-474052536
How can I debug what's going on? Cromwell isn't logging anything as far as I can tell, when the job hangs. Are there other logs files which might have clues?
I do have sudo access on my system, but am not sure why I want --overlay
. Why isn't the bind mount and --writable
solving it? Is there a way to check that the bind mount is really being mounted?
If we try the overlay option, the steps are:
--overlay
to the backend.conf
submit
script? @zfrenchee: Cromwell server mode is actually a job manager. So running a cromwell server on SLURM cluster is running another job manger on a job manager. There will be too many layers affecting this problem. So we don't recommend to run a cromwell server on HPCs. You can simply sbatch
a shell script with singularity
backend.. Please see this doc for details.
Did you use cromwell ver 34 (cromwell-34.jar
) to run the server?
@leepc12 yes
Okay. I was just wondering if you used cromwell-38. It's buggy and doesn't work with our slurm_singularity
backend.
@leepc12
Cromwell server mode is actually a job manager. So running a cromwell server on SLURM cluster is running another job manger on a job manager. There will be too many layers affecting this problem. So we don't recommend to run a cromwell server on HPCs.
The singularity
backend and the slurm_singularity
backend seem different, in that the slurm_singularity
backend includes resource management which is not included in the singularity
backend
Isn't this why you include slurm_singularity
in your backend.conf, to run cromwell this way?
I'd prefer not to try conda, because the dependency management is less clean than a container-based system like singularity. But if we can't get this to work, I'll have to try it.
@zfrenchee: Yes two backends (singularity
and slurm_singularity
) are different. slurm_singularity
is still there in the backend.conf
file. We don't recommend slurm_singularity
on HPC's but kept it in the backend file for other purposes.
Yes, container-based method is much better than Conda. Conda is not very clean and does not cover all OS/platforms. But singularity also has problems on some platforms (like old CentOS) because of its directory binding.
@leepc12
I found an error message when the containers hang:
FATAL: container creation failed: unabled to /pool/data/cromwell-aals to mount list: destination /pool/data/cromwell-aals is already in the mount point list
I then changed the slurm_singularity
config submit from
to
--wrap "chmod u+x ${script} && SINGULARITY_BINDPATH=${singularity_bindpath} singularity exec --home ${cwd} ${singularity_container} ${script}")
(i.e. removing $(echo ${cwd} | sed 's/cromwell-executions/\n/g' | head -n1),
from SINGULARITY_BINDPATH
). I submit with workflow opts:
{
"default_runtime_attributes" : {
"singularity_container" : "~/.singularity/atac-seq-pipeline-v1.1.7.simg",
"singularity_bindpath" : "/pool/data"
}
}
Removing $(echo ${cwd} | sed 's/cromwell-executions/\n/g' | head -n1),
in the way I show above gets me back to OSError: [Errno 30] Read-only file system
. What does $(echo ${cwd} | sed 's/cromwell-executions/\n/g' | head -n1),
evaluate to anyhow?
What should I try next?
I also just noticed your commit involving LD_LIBRARY_PATH
: https://github.com/ENCODE-DCC/atac-seq-pipeline/commit/9730ff02b83d2bd0286e89d9ca34d4f792249de9
Should I update to use that as well? (I am using singularity 3.1)
If I add back writable-tmpfs
I still get OSError
.
If I add back --writable
I get
WARNING: no overlay partition found
/.singularity.d/actions/exec: 9: exec: /pool/data/cromwell-aals/cromwell-executions/atac/eb83fe65-e2f9-46a6-912b-8e417e1a9881/call-trim_adapter/shard-0/execution/script: not found
That LD_LIBRARY_PATH
fix is not relevant to your case.
I re-tested slurm_singularity
on my SLURM cluster and it worked fine but I am not sure if this is helpful for you.
starting a server (on a login node)
java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm_singularity ~/cromwell-38.jar server
submitting a job (on a login node)
java -jar ~/cromwell-38.jar submit test_backend.wdl -o test_backend.wo.json
monitoring job
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
39736340 akundaje cromwell_fc81f7 leepc12 PD 0:00 1 (Priority)
test_backend.wdl
workflow test_backend {
call t1 {input: a = 'a'}
}
task t1 {
String a
command {
echo test > test.txt
}
output {
File out = 'test.txt'
}
runtime {
cpu : 2
memory : "2000 MB"
time : 4
}
}
test_backend.wo.json
{
"default_runtime_attributes" : {
"singularity_container" : "/home/groups/cherry/encode/pipeline_singularity_images/atac-seq-pipeline-v1.1.7.simg",
"slurm_partition" : "akundaje"
}
}
server log
2019-03-27 10:43:33,524 cromwell-system-akka.dispatchers.api-dispatcher-32 INFO - Unspecified type (Unspecified version) workflow fc81f708-8a1c-4418-851f-87d88d371783 submitted
2019-03-27 10:43:43,167 cromwell-system-akka.dispatchers.engine-dispatcher-36 INFO - 1 new workflows fetched
2019-03-27 10:43:43,167 cromwell-system-akka.dispatchers.engine-dispatcher-101 INFO - WorkflowManagerActor Starting workflow UUID(fc81f708-8a1c-4418-851f-87d88d371783)
2019-03-27 10:43:43,167 cromwell-system-akka.dispatchers.engine-dispatcher-101 INFO - WorkflowManagerActor Successfully started WorkflowActor-fc81f708-8a1c-4418-851f-87d88d371783
2019-03-27 10:43:43,168 cromwell-system-akka.dispatchers.engine-dispatcher-101 INFO - Retrieved 1 workflows from the WorkflowStoreActor
2019-03-27 10:43:43,187 cromwell-system-akka.dispatchers.engine-dispatcher-37 INFO - MaterializeWorkflowDescriptorActor [UUID(fc81f708)]: Parsing workflow as WDL draft-2
2019-03-27 10:43:43,202 cromwell-system-akka.dispatchers.engine-dispatcher-37 INFO - MaterializeWorkflowDescriptorActor [UUID(fc81f708)]: Call-to-Backend assignments: test_backend.t1 -> slurm_singularity
2019-03-27 10:43:45,496 cromwell-system-akka.dispatchers.engine-dispatcher-80 INFO - WorkflowExecutionActor-fc81f708-8a1c-4418-851f-87d88d371783 [UUID(fc81f708)]: Starting test_backend.t1
2019-03-27 10:43:46,343 cromwell-system-akka.dispatchers.engine-dispatcher-89 INFO - Assigned new job execution tokens to the following groups: fc81f708: 1
2019-03-27 10:43:46,545 cromwell-system-akka.dispatchers.backend-dispatcher-59 INFO - DispatchedConfigAsyncJobExecutionActor [UUID(fc81f708)test_backend.t1:NA:1]: `echo test > test.txt`
2019-03-27 10:43:46,618 cromwell-system-akka.dispatchers.backend-dispatcher-59 INFO - DispatchedConfigAsyncJobExecutionActor [UUID(fc81f708)test_backend.t1:NA:1]: executing: ls /home/groups/cherry/encode/pipeline_singularity_images/atac-seq-pipeline-v1.1.7.simg $(echo | tr , ' ') 1>/dev/null && (sbatch \
--export=ALL \
-J cromwell_fc81f708_t1 \
-D /oak/stanford/groups/akundaje/leepc12/code/atac-seq-pipeline/cromwell-executions/test_backend/fc81f708-8a1c-4418-851f-87d88d371783/call-t1 \
-o /oak/stanford/groups/akundaje/leepc12/code/atac-seq-pipeline/cromwell-executions/test_backend/fc81f708-8a1c-4418-851f-87d88d371783/call-t1/execution/stdout \
-e /oak/stanford/groups/akundaje/leepc12/code/atac-seq-pipeline/cromwell-executions/test_backend/fc81f708-8a1c-4418-851f-87d88d371783/call-t1/execution/stderr \
-t 240 \
-n 1 \
--ntasks-per-node=1 \
--cpus-per-task=2 \
--mem=2000 \
-p akundaje \
\
\
\
--wrap "chmod u+x /oak/stanford/groups/akundaje/leepc12/code/atac-seq-pipeline/cromwell-executions/test_backend/fc81f708-8a1c-4418-851f-87d88d371783/call-t1/execution/script && LD_LIBRARY_PATH=:$LD_LIBRARY_PATH SINGULARITY_BINDPATH=$(echo /oak/stanford/groups/akundaje/leepc12/code/atac-seq-pipeline/cromwell-executions/test_backend/fc81f708-8a1c-4418-851f-87d88d371783/call-t1 | sed 's/cromwell-executions/\n/g' | head -n1), singularity exec --home /oak/stanford/groups/akundaje/leepc12/code/atac-seq-pipeline/cromwell-executions/test_backend/fc81f708-8a1c-4418-851f-87d88d371783/call-t1 /home/groups/cherry/encode/pipeline_singularity_images/atac-seq-pipeline-v1.1.7.simg /oak/stanford/groups/akundaje/leepc12/code/atac-seq-pipeline/cromwell-executions/test_backend/fc81f708-8a1c-4418-851f-87d88d371783/call-t1/execution/script")
2019-03-27 10:43:49,049 cromwell-system-akka.dispatchers.backend-dispatcher-134 INFO - DispatchedConfigAsyncJobExecutionActor [UUID(fc81f708)test_backend.t1:NA:1]: job id: 39736340
2019-03-27 10:43:49,117 cromwell-system-akka.dispatchers.backend-dispatcher-134 INFO - DispatchedConfigAsyncJobExecutionActor [UUID(fc81f708)test_backend.t1:NA:1]: Cromwell will watch for an rc file but will *not* double-check whether this job is actually alive (unless Cromwell restarts)
2019-03-27 10:43:49,119 cromwell-system-akka.dispatchers.backend-dispatcher-61 INFO - DispatchedConfigAsyncJobExecutionActor [UUID(fc81f708)test_backend.t1:NA:1]: Status change from - to Running
2019-03-27 10:46:46,906 cromwell-system-akka.dispatchers.backend-dispatcher-133 INFO - DispatchedConfigAsyncJobExecutionActor [UUID(fc81f708)test_backend.t1:NA:1]: Status change from Running to Done
2019-03-27 10:46:48,122 cromwell-system-akka.dispatchers.engine-dispatcher-58 INFO - WorkflowExecutionActor-fc81f708-8a1c-4418-851f-87d88d371783 [UUID(fc81f708)]: Workflow test_backend complete. Final Outputs:
{
"test_backend.t1.out": "/oak/stanford/groups/akundaje/leepc12/code/atac-seq-pipeline/cromwell-executions/test_backend/fc81f708-8a1c-4418-851f-87d88d371783/call-t1/execution/test.txt"
}
2019-03-27 10:46:48,149 cromwell-system-akka.dispatchers.engine-dispatcher-47 INFO - WorkflowManagerActor WorkflowActor-fc81f708-8a1c-4418-851f-87d88d371783 is in a terminal state: WorkflowSucceededState
Here is a singularity configuration file on my cluster.
$ cat /etc/singularity/singularity.conf
# SINGULARITY.CONF
# This is the global configuration file for Singularity. This file controls
# what the container is allowed to do on a particular host, and as a result
# this file must be owned by root.
# ALLOW SETUID: [BOOL]
# DEFAULT: yes
# Should we allow users to utilize the setuid program flow within Singularity?
# note1: This is the default mode, and to utilize all features, this option
# must be enabled. For example, without this option loop mounts of image
# files will not work; only sandbox image directories, which do not need loop
# mounts, will work (subject to note 2).
# note2: If this option is disabled, it will rely on unprivileged user
# namespaces which have not been integrated equally between different Linux
# distributions.
allow setuid = yes
# MAX LOOP DEVICES: [INT]
# DEFAULT: 256
# Set the maximum number of loop devices that Singularity should ever attempt
# to utilize.
max loop devices = 256
# ALLOW PID NS: [BOOL]
# DEFAULT: yes
# Should we allow users to request the PID namespace? Note that for some HPC
# resources, the PID namespace may confuse the resource manager and break how
# some MPI implementations utilize shared memory. (note, on some older
# systems, the PID namespace is always used)
allow pid ns = yes
# CONFIG PASSWD: [BOOL]
# DEFAULT: yes
# If /etc/passwd exists within the container, this will automatically append
# an entry for the calling user.
config passwd = yes
# CONFIG GROUP: [BOOL]
# DEFAULT: yes
# If /etc/group exists within the container, this will automatically append
# group entries for the calling user.
config group = yes
# CONFIG RESOLV_CONF: [BOOL]
# DEFAULT: yes
# If there is a bind point within the container, use the host's
# /etc/resolv.conf.
config resolv_conf = yes
# MOUNT PROC: [BOOL]
# DEFAULT: yes
# Should we automatically bind mount /proc within the container?
mount proc = yes
# MOUNT SYS: [BOOL]
# DEFAULT: yes
# Should we automatically bind mount /sys within the container?
mount sys = yes
# MOUNT DEV: [yes/no/minimal]
# DEFAULT: yes
# Should we automatically bind mount /dev within the container? If 'minimal'
# is chosen, then only 'null', 'zero', 'random', 'urandom', and 'shm' will
# be included (the same effect as the --contain options)
mount dev = yes
# MOUNT DEVPTS: [BOOL]
# DEFAULT: yes
# Should we mount a new instance of devpts if there is a 'minimal'
# /dev, or -C is passed? Note, this requires that your kernel was
# configured with CONFIG_DEVPTS_MULTIPLE_INSTANCES=y, or that you're
# running kernel 4.7 or newer.
mount devpts = yes
# MOUNT HOME: [BOOL]
# DEFAULT: yes
# Should we automatically determine the calling user's home directory and
# attempt to mount it's base path into the container? If the --contain option
# is used, the home directory will be created within the session directory or
# can be overridden with the SINGULARITY_HOME or SINGULARITY_WORKDIR
# environment variables (or their corresponding command line options).
mount home = yes
# MOUNT TMP: [BOOL]
# DEFAULT: yes
# Should we automatically bind mount /tmp and /var/tmp into the container? If
# the --contain option is used, both tmp locations will be created in the
# session directory or can be specified via the SINGULARITY_WORKDIR
# environment variable (or the --workingdir command line option).
mount tmp = yes
# MOUNT HOSTFS: [BOOL]
# DEFAULT: no
# Probe for all mounted file systems that are mounted on the host, and bind
# those into the container?
mount hostfs = no
# BIND PATH: [STRING]
# DEFAULT: Undefined
# Define a list of files/directories that should be made available from within
# the container. The file or directory must exist within the container on
# which to attach to. you can specify a different source and destination
# path (respectively) with a colon; otherwise source and dest are the same.
#bind path = /etc/singularity/default-nsswitch.conf:/etc/nsswitch.conf
#bind path = /opt
#bind path = /scratch
bind path = /etc/localtime
bind path = /etc/hosts
# USER BIND CONTROL: [BOOL]
# DEFAULT: yes
# Allow users to influence and/or define bind points at runtime? This will allow
# users to specify bind points, scratch and tmp locations. (note: User bind
# control is only allowed if the host also supports PR_SET_NO_NEW_PRIVS)
user bind control = yes
# ENABLE OVERLAY: [yes/no/try]
# DEFAULT: try
# Enabling this option will make it possible to specify bind paths to locations
# that do not currently exist within the container. If 'try' is chosen,
# overlayfs will be tried but if it is unavailable it will be silently ignored.
enable overlay = try
# ENABLE UNDERLAY: [yes/no]
# DEFAULT: yes
# Enabling this option will make it possible to specify bind paths to locations
# that do not currently exist within the container even if overlay is not
# working. If overlay is available, it will be tried first.
enable underlay = yes
# MOUNT SLAVE: [BOOL]
# DEFAULT: yes
# Should we automatically propagate file-system changes from the host?
# This should be set to 'yes' when autofs mounts in the system should
# show up in the container.
mount slave = yes
# SESSIONDIR MAXSIZE: [STRING]
# DEFAULT: 16
# This specifies how large the default sessiondir should be (in MB) and it will
# only affect users who use the "--contain" options and don't also specify a
# location to do default read/writes to (e.g. "--workdir" or "--home").
sessiondir max size = 16
# LIMIT CONTAINER OWNERS: [STRING]
# DEFAULT: NULL
# Only allow containers to be used that are owned by a given user. If this
# configuration is undefined (commented or set to NULL), all containers are
# allowed to be used. This feature only applies when Singularity is running in
# SUID mode and the user is non-root.
#limit container owners = gmk, singularity, nobody
# LIMIT CONTAINER GROUPS: [STRING]
# DEFAULT: NULL
# Only allow containers to be used that are owned by a given group. If this
# configuration is undefined (commented or set to NULL), all containers are
# allowed to be used. This feature only applies when Singularity is running in
# SUID mode and the user is non-root.
#limit container groups = group1, singularity, nobody
# LIMIT CONTAINER PATHS: [STRING]
# DEFAULT: NULL
# Only allow containers to be used that are located within an allowed path
# prefix. If this configuration is undefined (commented or set to NULL),
# containers will be allowed to run from anywhere on the file system. This
# feature only applies when Singularity is running in SUID mode and the user is
# non-root.
#limit container paths = /scratch, /tmp, /global
# ALLOW CONTAINER ${TYPE}: [BOOL]
# DEFAULT: yes
# This feature limits what kind of containers that Singularity will allow
# users to use (note this does not apply for root).
allow container squashfs = yes
allow container extfs = yes
allow container dir = yes
# AUTOFS BUG PATH: [STRING]
# DEFAULT: Undefined
# Define list of autofs directories which produces "Too many levels of symbolink links"
# errors when accessed from container (typically bind mounts)
#autofs bug path = /nfs
#autofs bug path = /cifs-share
# ALWAYS USE NV ${TYPE}: [BOOL]
# DEFAULT: no
# This feature allows an administrator to determine that every action command
# should be executed implicitely with the --nv option (useful for GPU only
# environments).
always use nv = no
# ROOT DEFAULT CAPABILITIES: [full/file/no]
# DEFAULT: no
# Define default root capability set kept during runtime
# - full: keep all capabilities (same as --keep-privs)
# - file: keep capabilities configured in ${prefix}/etc/singularity/capabilities/user.root
# - no: no capabilities (same as --no-privs)
root default capabilities = full
# MEMORY FS TYPE: [tmpfs/ramfs]
# DEFAULT: tmpfs
# This feature allow to choose temporary filesystem type used by Singularity.
# Cray CLE 5 and 6 up to CLE 6.0.UP05 there is an issue (kernel panic) when Singularity
# use tmpfs, so on affected version it's recommended to set this value to ramfs to avoid
# kernel panic
memory fs type = tmpfs
# CNI CONFIGURATION PATH: [STRING]
# DEFAULT: Undefined
# Defines path from where CNI configuration files are stored
#cni configuration path =
# CNI PLUGIN PATH: [STRING]
# DEFAULT: Undefined
# Defines path from where CNI executable plugins are stored
#cni plugin path =
# MKSQUASHFS PATH: [STRING]
# DEFAULT: Undefined
# This allows the administrator to specify the location for mksquashfs if it is not
# installed in a standard system location
# mksquashfs path =
# SHARED LOOP DEVICES: [BOOL]
# DEFAULT: no
# Allow to share same images associated with loop devices to minimize loop
# usage and optimize kernel cache (useful for MPI)
shared loop devices = no
OSError: [Errno 30] Read-only file system
).@leepc12
What does $(echo ${cwd} | sed 's/cromwell-executions/\n/g' | head -n1), evaluate to anyhow?
Why does adding --writable
cause WARNING: no overlay partition found
?
Perhaps something having to do with this? How do I bind /pool/data:/pool/data
as writable in a singularity container?
What version of singularity are you using? (I'm using 3.1)
1 and 3. No there are more commits to fix hanging problem for slurm_singularity
backend. You can test with this PR (https://github.com/ENCODE-DCC/atac-seq-pipeline/pull/102).
Read-only file system
problem. It looks somehow related to python and singularity configuration. Please try with the above PR and see if that fixes it.That extracts cromwell_root
, which is cromwell-executions/
by default, and add it to SINGULARITY_BINDPATH
.
We actually assume that pipelines run on an overlay file system. Please don't use mapping (A:B
). Use A only.
I have both 2.5.1 and 3.1 so trying to make our pipelines compatible with both.
Thanks for your answers, @leepc12
I copied over the changes to backend.conf
from #102 but am still getting the OSError.
Could this have something specifically to do with python multiprocessing? After all, the error is:
Job atac.trim_adapter:0:1 exited with return code 1
...
File "/software/atac-seq-pipeline/src/encode_trim_adapter.py", line 265, in <module>
main()
File "/software/atac-seq-pipeline/src/encode_trim_adapter.py", line 168, in main
pool = multiprocessing.Pool(num_process)
File "/usr/lib/python2.7/multiprocessing/__init__.py", line 232, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 138, in __init__
self._setup_queues()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 234, in _setup_queues
self._inqueue = SimpleQueue()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 354, in __init__
self._rlock = Lock()
File "/usr/lib/python2.7/multiprocessing/synchronize.py", line 147, in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1)
File "/usr/lib/python2.7/multiprocessing/synchronize.py", line 75, in __init__
sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
OSError: [Errno 30] Read-only file system
When I skip using cromwell, and manually just do:
$ sudo sbatch
--export=ALL
-D /pool/data/cromwell-aals/cromwell-executions/atac/5e0d8c63-f0ab-466b-9d26-6b49cc41dcee/call-trim_adapter/shard-0
-o ~/stdout.txt
-e ~/stderr.txt
-n 1
--ntasks-per-node=1
--wrap "singularity exec
--cleanenv
-B /pool/data
--home /pool/data/cromwell-aals/cromwell-executions/atac/5e0d8c63-f0ab-466b-9d26-6b49cc41dcee/call-trim_adapter/shard-0
/home/lenail/.singularity/atac-seq-pipeline-v1.1.7.simg
/bin/bash /pool/data/cromwell-aals/cromwell-executions/atac/5e0d8c63-f0ab-466b-9d26-6b49cc41dcee/call-trim_adapter/shard-0/execution/script"
I get:
~ » cat stderr.txt
ln: failed to access 'R1/*.fastq.gz': No such file or directory
ln: failed to access 'R2/*.fastq.gz': No such file or directory
which must be coming from the ln
's in script
:
( ln -L R1/*.fastq.gz /pool/data/cromwell-aals/cromwell-executions/atac/5e0d8c63-f0ab-466b-9d26-6b49cc41dcee/call-trim_adapter/shard-0/execution/glob-cf395bba00b93cc4a5f238577ff98973 2> /dev/null ) || ( ln R1/*.fastq.gz /pool/data/cromwell-aals/cromwell-executions/atac/5e0d8c63-f0ab-466b-9d26-6b49cc41dcee/call-trim_adapter/shard-0/execution/glob-cf395bba00b93cc4a5f238577ff98973 )
What is the step that is supposed to localize my files to R1/*.fastq.gz
?
Here's how I specify the fastq's in atac_trial.json
:
"atac.fastqs_rep1_R1" : [ "/pool/data/globus/epigenomics/1_fastq/6_protocol_selection/diMN32/ALS-0BUU_diMN32_rep1_1.fastq" ],
"atac.fastqs_rep1_R2" : [ "/pool/data/globus/epigenomics/1_fastq/6_protocol_selection/diMN32/ALS-0BUU_diMN32_rep1_2.fastq" ],
Note that they are not gzipped when I submit them, and therefore will not be accessible via *.fastq.gz
. I'm using v1.17
so I expect this should not be an issue?
The solution turned out to be fairly "deep": -B /pool/data,/run
because of this debian "bug"
Describe the bug
I'm hoping to run Cromwell in server mode, configured to dispatch singularity jobs to slurm. I have successfully run this pipeline with singularity without slurm. When I change the configuration to use SLURM, the workflow fails at the trim_adapters (first) step.
Working config (singularity without SLURM, modeled from backend.conf):
Config I'm switching to (also modeled from backend.conf):
trim_adapter
command being generated:Error:
My suspicion is that the SLURM task doesn't have the same privileges as cromwell does when SLURM runs the singularity container, but I'm not sure how to fix this.
OS/Platform and dependencies