Open trberg opened 5 years ago
Tagging @brucehoff and @tschaffter .
I also noticed that when I run the workflow hook, I find that the files created by CWL are owned by root
even though I am running the workflowhook as user
This is a confusing bug report:
We are using Redhat (which doesn't support docker-compose) for our OS
If you can't use docker compose with RHEL then why are you trying to do so?
We had this pipeline up and running at one point
IF RHEL doesn't support docker compose then how could you have been 'up and running'?
I will assume you can use docker compose with RHEL or that you are using another OS.
had to restart the VM and now it's broken. The restart updated the OS and docker version
Can you otherwise use Docker and Docker compose (not using the Synapse Workflow Hook)? The Python error is in the requests
package. Can you generally make web requests from within containers in your environment? That is, can you separate problems with Docker on your system from the Synapse Workflow Hook itself?
I also noticed that when I run the workflow hook, I find that the files created by CWL are owned by
root
even though I am running the workflowhook asuser
Offhand I would say that's an unrelated issue. It doesn't sound like a bug (I would expect files to be owned by the user running the Docker daemon (usually root
), not by the user running the Docker client) and it's not clear why it is a problem. If it is a problem, let's discuss separately, please.
@trberg Can Justin see this issue too?
I promised to show how to run Toil directly, bypassing the Synapse workflow hook. To do this: 1) I set up a workflow with Synapse as shown in the instructions page; 2) I submitted a job to the submission queue; 3) I ran 'docker ps':
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
00f0b1b8febe sagebionetworks/synapseworkflowhook-toil "toil-cwl-runner --d…" 4 seconds ago Up 2 seconds workflow_job.a17ed7d8-eb5f-4bb5-8ba6-33ed3004efa8
76915fd1bf66 sagebionetworks/synapseworkflowhook "/usr/local/bin/mvn-…" 3 minutes ago Up 3 minutes synapseworkflowhook_workflow-hook_1
synapseworkflowhook_workflow-hook_1
is the hook, workflow_job.a17ed7d8-eb5f-4bb5-8ba6-33ed3004efa8
is the job.
4) I ran
docker inspect --format "$(<run.tpl)" workflow_job.a17ed7d8-eb5f-4bb5-8ba6-33ed3004efa8
where run.tpl is here: https://gist.github.com/efrecon/8ce9c75d518b6eb863f667442d7bc679 5) I manually cleaned up the resulting command for readability. I created a folder 'deleteme' under my home account, '/Users/bhoff'. The command is:
docker run \
--name=/workflow_job \
--env="TMPDIR=/Users/bhoff/Documents/deleteme" \
--env="TEMP=/Users/bhoff/Documents/deleteme" \
--env="TMP=/Users/bhoff/Documents/deleteme" \
--env="DOCKER_HOST=unix:///var/run/docker.sock" \
--env="PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
--env="LANG=C.UTF-8" \
--env="PYTHONIOENCODING=UTF-8" \
--env="GPG_KEY=C01E1CAD5EA2C4F0B8E3571504C367C218ADD4FF" \
--env="PYTHON_VERSION=2.7.16" \
--env="PYTHON_PIP_VERSION=19.0.3" \
--network "bridge" \
--volume="/Users/bhoff/Documents/deleteme:/Users/bhoff/Documents/deleteme:rw" \
--volume="/var/run:/var/run:rw" \
--log-driver="json-file" \
--log-opt max-file="2" \
--log-opt max-size="1g" \
--restart="" \
--detach=true \
-w "/Users/bhoff/Documents/deleteme" \
"sagebionetworks/synapseworkflowhook-toil" \
"toil-cwl-runner" "--defaultMemory" "100M" "--retryCount" "0" "--defaultDisk" "1000000" "--workDir" "/Users/bhoff/Documents/deleteme" "--noLinkImports" "SynapseWorkflowExample/sample-workflow.cwl" "SynapseWorkflowExample/sample-workflow-job.yaml"
I see that this includes --volume="/var/run:/var/run:rw"
. I had forgotten that we add this to the Toil container volume mounts when mounting it on the Hook container. See https://github.com/Sage-Bionetworks/SynapseWorkflowHook/blob/master/src/main/java/org/sagebionetworks/WES.java#L243-L247
6) In /Users/bhoff/Documents/deleteme
I ran:
git clone https://github.com/Sage-Bionetworks/SynapseWorkflowExample.git
bhoff$ ls -al SynapseWorkflowExample/
total 88
drwxr-xr-x 12 bhoff staff 384 Jun 20 13:25 .
drwxr-xr-x 5 bhoff staff 160 Jun 20 13:25 ..
drwxr-xr-x 13 bhoff staff 416 Jun 20 13:25 .git
-rw-r--r-- 1 bhoff staff 11357 Jun 20 13:25 LICENSE
-rw-r--r-- 1 bhoff staff 105 Jun 20 13:25 README.md
-rw-r--r-- 1 bhoff staff 2791 Jun 20 13:25 annotateSubmission.cwl
-rw-r--r-- 1 bhoff staff 2236 Jun 20 13:25 downloadSubmissionFile.cwl
-rw-r--r-- 1 bhoff staff 838 Jun 20 13:25 job_file_reader_tool_yaml_sample.cwl
-rw-r--r-- 1 bhoff staff 22 Jun 20 13:25 sample-workflow-job.yaml
-rw-r--r-- 1 bhoff staff 322 Jun 20 13:25 sample-workflow.cwl
-rw-r--r-- 1 bhoff staff 2978 Jun 20 13:25 uploadToSynapse.cwl
-rw-r--r-- 1 bhoff staff 2270 Jun 20 13:25 workflow-entrypoint.cwl
7) I ran the command in step 5
docker run \
> --name=/workflow_job \
> --env="TMPDIR=/Users/bhoff/Documents/deleteme" \
> --env="TEMP=/Users/bhoff/Documents/deleteme" \
> --env="TMP=/Users/bhoff/Documents/deleteme" \
> --env="DOCKER_HOST=unix:///var/run/docker.sock" \
> --env="PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
> --env="LANG=C.UTF-8" \
> --env="PYTHONIOENCODING=UTF-8" \
> --env="GPG_KEY=C01E1CAD5EA2C4F0B8E3571504C367C218ADD4FF" \
> --env="PYTHON_VERSION=2.7.16" \
> --env="PYTHON_PIP_VERSION=19.0.3" \
> --network "bridge" \
> --volume="/Users/bhoff/Documents/deleteme:/Users/bhoff/Documents/deleteme:rw" \
> --volume="/var/run:/var/run:rw" \
> --log-driver="json-file" \
> --log-opt max-file="2" \
> --log-opt max-size="1g" \
> --restart="" \
> --detach=true \
> -w "/Users/bhoff/Documents/deleteme" \
> "sagebionetworks/synapseworkflowhook-toil" \
> "toil-cwl-runner" "--defaultMemory" "100M" "--retryCount" "0" "--defaultDisk" "1000000" "--workDir" "/Users/bhoff/Documents/deleteme" "--noLinkImports" "SynapseWorkflowExample/sample-workflow.cwl" "SynapseWorkflowExample/sample-workflow-job.yaml"
d7198726d6121bd869315b2fe2f82bb39834b7213202d84cda5d592094b4610d
Bruces-MacBook-Pro:deleteme bhoff$ docker logs workflow_job
INFO:cwltool:Resolved 'SynapseWorkflowExample/sample-workflow.cwl' to 'file:///Users/bhoff/Documents/deleteme/SynapseWorkflowExample/sample-workflow.cwl'
WARNING:toil.batchSystems.singleMachine:Limiting maxCores to CPU count of system (6).
WARNING:toil.batchSystems.singleMachine:Limiting maxMemory to physically available memory (2095575040).
WARNING:toil.batchSystems.singleMachine:Limiting maxDisk to physically available disk (212669071360).
INFO:toil:Running Toil version 3.18.0-84239d802248a5f4a220e762b3b8ce5cc92af0be.
INFO:toil.leader:Issued job 'file:///Users/bhoff/Documents/deleteme/SynapseWorkflowExample/sample-workflow.cwl' echo C/n/jobZXTEj6 with job batch system ID: 0 and cores: 1, disk: 2.0 G, and memory: 100.0 M
INFO:toil.leader:Job ended successfully: 'file:///Users/bhoff/Documents/deleteme/SynapseWorkflowExample/sample-workflow.cwl' echo C/n/jobZXTEj6
INFO:toil.leader:Finished toil run successfully.
INFO:toil.common:Successfully deleted the job store: FileJobStore(/Users/bhoff/Documents/deleteme/tmpcudFwN)
{
"stdout": {
"checksum": "sha1$47a013e660d408619d894b20806b1d5086aab03b",
"basename": "stdout.txt",
"nameext": ".txt",
"nameroot": "stdout",
"location": "file:///Users/bhoff/Documents/deleteme/stdout.txt",
"class": "File",
"size": 13
}
}Bruces-MacBook-Pro:deleteme bhoff$ cat stdout.txt
Hello world!
@trberg Can Justin see this issue too?
@jprosser are you able to see this thread?
I'll run through this toil tutorial and let you know what happens, thanks!
Yep, got it!
One significant change in my manual construction of the docker run
command was to set the working directory for Toil (the -w
option). Without it I get:
docker run \
> --name=/workflow_job \
> --env="TMPDIR=/Users/bhoff/Documents/deleteme" \
> --env="TEMP=/Users/bhoff/Documents/deleteme" \
> --env="TMP=/Users/bhoff/Documents/deleteme" \
> --env="DOCKER_HOST=unix:///var/run/docker.sock" \
> --env="PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
> --env="LANG=C.UTF-8" \
> --env="PYTHONIOENCODING=UTF-8" \
> --env="GPG_KEY=C01E1CAD5EA2C4F0B8E3571504C367C218ADD4FF" \
> --env="PYTHON_VERSION=2.7.16" \
> --env="PYTHON_PIP_VERSION=19.0.3" \
> --network "bridge" \
> --volume="/Users/bhoff/Documents/deleteme:/Users/bhoff/Documents/deleteme:rw" \
> --volume="/var/run:/var/run:rw" \
> --log-driver="json-file" \
> --log-opt max-file="2" \
> --log-opt max-size="1g" \
> --restart="" \
> --detach=true \
> "sagebionetworks/synapseworkflowhook-toil" \
> "toil-cwl-runner" "--defaultMemory" "100M" "--retryCount" "0" "--defaultDisk" "1000000" "--workDir" "/Users/bhoff/Documents/deleteme" "--noLinkImports" "SynapseWorkflowExample/sample-workflow.cwl" "SynapseWorkflowExample/sample-workflow-job.yaml"
3d8f292a9654e5d61cc60315d1cbf6a6dfd2c079bfd93e10c3e7357dfbe59fb8
Bruces-MacBook-Pro:deleteme bhoff$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3d8f292a9654 sagebionetworks/synapseworkflowhook-toil "toil-cwl-runner --d…" 5 seconds ago Exited (1) 3 seconds ago workflow_job
Bruces-MacBook-Pro:deleteme bhoff$ docker logs workflow_job
Traceback (most recent call last):
File "/usr/local/bin/toil-cwl-runner", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/toil/cwl/cwltoil.py", line 1130, in main
loading_context.fetcher_constructor)
File "/usr/local/lib/python2.7/site-packages/cwltool/load_tool.py", line 91, in resolve_tool_uri
raise ValidationException("Not found: '%s'" % argsworkflow)
schema_salad.validate.ValidationException: Not found: 'SynapseWorkflowExample/sample-workflow.cwl'
I'm pretty sure the lack of the -w
option does not cause the problem we've been discussing because if it did the workflow would never start to run, i.e., it would not reach the 'runDocker' command in the original stack trace:
STDERR: 2019-06-13T21:33:19.533538167Z WARNING:toil.leader:d/T/jobIXsDkh Traceback (most recent call last):
STDERR: 2019-06-13T21:33:19.533545557Z WARNING:toil.leader:d/T/jobIXsDkh File "runDocker.py", line 157, in <module>
...
STDERR: 2019-06-13T21:33:19.533712621Z WARNING:toil.leader:d/T/jobIXsDkh requests.exceptions.ConnectionError: ('Connection aborted.', error(13, 'Permission denied'))
@trberg You shouldn't have to redo everything I did, just steps 6,7,8. That is, just make a working directory, clone the sample workflow from Github, customize the docker run
command for your directory path, and run it. Once you verify it works, please replace with a more complex workflow, esp. one which runs a Docker container itself. Does it work? Does the outcome change if you change:
--volume="/var/run:/var/run:rw" \
to
--volume="/var/run/docker.sock:/var/run/docker.sock:rw" \
?
So I'm running into permission issues in step 7. When trying to look at the log files.
[trberg@con4 ~]$ docker logs workflow_job
Traceback (most recent call last):
File "/usr/local/bin/toil-cwl-runner", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/toil/cwl/cwltoil.py", line 1130, in main
loading_context.fetcher_constructor)
File "/usr/local/lib/python2.7/site-packages/cwltool/load_tool.py", line 88, in resolve_tool_uri
uri = resolver(document_loader, argsworkflow)
File "/usr/local/lib/python2.7/site-packages/cwltool/resolver.py", line 41, in tool_resolver
ret = r(document_loader, uri)
File "/usr/local/lib/python2.7/site-packages/cwltool/resolver.py", line 18, in resolve_local
if pathobj.is_file():
File "/usr/local/lib/python2.7/site-packages/pathlib2/__init__.py", line 1575, in is_file
return S_ISREG(self.stat().st_mode)
File "/usr/local/lib/python2.7/site-packages/pathlib2/__init__.py", line 1356, in stat
return self._accessor.stat(self)
File "/usr/local/lib/python2.7/site-packages/pathlib2/__init__.py", line 541, in wrapped
return strfunc(str(pathobj), *args)
OSError: [Errno 13] Permission denied: '/data/users/trberg/deleteme/SynapseWorkflowExample/sample-workflow.cwl'
The permissions in that directory are as such
-rw-rw-r--. 1 trberg trberg 2791 Jun 20 14:35 annotateSubmission.cwl
-rw-rw-r--. 1 trberg trberg 2236 Jun 20 14:35 downloadSubmissionFile.cwl
-rw-rw-r--. 1 trberg trberg 838 Jun 20 14:35 job_file_reader_tool_yaml_sample.cwl
-rw-rw-r--. 1 trberg trberg 20 Jun 20 14:35 jobinput.yaml
-rw-rw-r--. 1 trberg trberg 11357 Jun 20 14:35 LICENSE
-rw-rw-r--. 1 trberg trberg 105 Jun 20 14:35 README.md
-rw-rw-r--. 1 trberg trberg 322 Jun 20 14:35 sample-workflow.cwl
-rw-rw-r--. 1 trberg trberg 22 Jun 20 14:35 sample-workflow-job.yaml
-rw-rw-r--. 1 trberg trberg 2978 Jun 20 14:35 uploadToSynapse.cwl
-rw-rw-r--. 1 trberg trberg 2270 Jun 20 14:35 workflow-entrypoint.cwl
I went ahead and just mapped --volume="/var/run/docker.sock:/var/run/docker.sock:rw" due to the headache the /var/run:/var/run mapping caused previously.
@trberg Mounting /var/run
is irrelevant for the very simple workflow since it doesn't try to start a container itself. Here is a stripped down version of the docker run
command that works for me. Does it work in your environment? (You should only have to change the first export
command.)
export SHARED=/Users/bhoff/Documents/deleteme
docker run \
--rm -it \
--env="TMPDIR=${SHARED}" \
--env="TEMP=${SHARED}" \
--env="TMP=${SHARED}" \
--env="PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
--volume="${SHARED}:${SHARED}:rw" \
-w "${SHARED}" \
"sagebionetworks/synapseworkflowhook-toil" \
"toil-cwl-runner" "--defaultMemory" "100M" "--retryCount" "0" "--defaultDisk" "1000000" "--workDir" "${SHARED}" "--noLinkImports" "SynapseWorkflowExample/sample-workflow.cwl" "SynapseWorkflowExample/sample-workflow-job.yaml"
This also works for me. Does it for you?
docker run --rm -it -v ${SHARED}:${SHARED} -w ${SHARED} ubuntu cat SynapseWorkflowExample/sample-workflow.cwl
So that first one doesn't work for me
[trberg@con4 ~]$ docker run \
> --rm -it \
> --env="TMPDIR=${SHARED}" \
> --env="TEMP=${SHARED}" \
> --env="TMP=${SHARED}" \
> --env="PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
> --volume="${SHARED}:${SHARED}:rw" \
> -w "${SHARED}" \
> "sagebionetworks/synapseworkflowhook-toil" \
> "toil-cwl-runner" "--defaultMemory" "100M" "--retryCount" "0" "--defaultDisk" "1000000" "--workDir" "${SHARED}" "--noLinkImports" "SynapseWorkflowExample/sample-workflow.cwl" "SynapseWorkflowExample/sample-workflow-job.yaml"
Traceback (most recent call last):
File "/usr/local/bin/toil-cwl-runner", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/toil/cwl/cwltoil.py", line 1130, in main
loading_context.fetcher_constructor)
File "/usr/local/lib/python2.7/site-packages/cwltool/load_tool.py", line 88, in resolve_tool_uri
uri = resolver(document_loader, argsworkflow)
File "/usr/local/lib/python2.7/site-packages/cwltool/resolver.py", line 41, in tool_resolver
ret = r(document_loader, uri)
File "/usr/local/lib/python2.7/site-packages/cwltool/resolver.py", line 18, in resolve_local
if pathobj.is_file():
File "/usr/local/lib/python2.7/site-packages/pathlib2/__init__.py", line 1575, in is_file
return S_ISREG(self.stat().st_mode)
File "/usr/local/lib/python2.7/site-packages/pathlib2/__init__.py", line 1356, in stat
return self._accessor.stat(self)
File "/usr/local/lib/python2.7/site-packages/pathlib2/__init__.py", line 541, in wrapped
return strfunc(str(pathobj), *args)
OSError: [Errno 13] Permission denied: '/data/users/trberg/deleteme/SynapseWorkflowExample/sample-workflow.cwl'
Same issue with that second one
Even when I open all the permissions like
-rwxrwxrwx. 1 trberg trberg 2791 Jun 20 14:35 annotateSubmission.cwl
-rwxrwxrwx. 1 trberg trberg 2236 Jun 20 14:35 downloadSubmissionFile.cwl
-rwxrwxrwx. 1 trberg trberg 838 Jun 20 14:35 job_file_reader_tool_yaml_sample.cwl
-rwxrwxrwx. 1 trberg trberg 20 Jun 20 14:35 jobinput.yaml
-rwxrwxrwx. 1 trberg trberg 11357 Jun 20 14:35 LICENSE
-rwxrwxrwx. 1 trberg trberg 105 Jun 20 14:35 README.md
-rwxrwxrwx. 1 trberg trberg 322 Jun 20 14:35 sample-workflow.cwl
-rwxrwxrwx. 1 trberg trberg 22 Jun 20 14:35 sample-workflow-job.yaml
-rwxrwxrwx. 1 trberg trberg 2978 Jun 20 14:35 uploadToSynapse.cwl
-rwxrwxrwx. 1 trberg trberg 2270 Jun 20 14:35 workflow-entrypoint.cwl
I get the same permission errors. @jprosser any thoughts on this?
I would expect a perm denied on the home dir, those are selinux protected to be just the user basically. If you move over to /data/common the permissions should work out.
I feel like we can make progress by understanding what's required for the simple case of reading a mounted file, e.g.:
export SHARED=/some/shared/directory # foo.txt is in the directory
docker run --rm -it -v ${SHARED}:${SHARED} -w ${SHARED} ubuntu cat foo.txt
If we can clarify what is needed to make this work we can move on to bigger things.
What seems to be happening is that toil is creating folders in the Docker container that later are inaccessible. When I change the working directory in the container to something like "/testing" we get the following.
[trberg@con4 dream]$ docker run
--name=/workflow_job
--env="TMPDIR=${SHARED}"
--env="TEMP=${SHARED}"
--env="TMP=${SHARED}"
--env="DOCKER_HOST=unix:///var/run/docker.sock"
--env="PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
--env="LANG=C.UTF-8"
--env="PYTHONIOENCODING=UTF-8"
--env="GPG_KEY=C01E1CAD5EA2C4F0B8E3571504C367C218ADD4FF"
--env="PYTHON_VERSION=2.7.16"
--env="PYTHON_PIP_VERSION=19.0.3
" --network "bridge"
--volume="${SHARED}:/testing:rw"
--volume="/var/run/docker.sock:/var/run/docker.sock:rw"
--log-driver="json-file"
--log-opt max-file="2"
--log-opt max-size="1g"
--restart=""
--detach=true
-w "/testing" "sagebionetworks/synapseworkflowhook-toil" "toil-cwl-runner" "--defaultMemory" "100M" "--retryCount" "0" "--defaultDisk" "1000000" "--workDir" "/testing" "--noLinkImports" "SynapseWorkflowExample/sample-workflow.cwl" "SynapseWorkflowExample/sample-workflow-job.yaml"
64e51d62baee15065c83b9b42b066027937fe04e0b84f212b63de2782096143d
[trberg@con4 dream]$ docker logs workflow_job
INFO:cwltool:Resolved 'SynapseWorkflowExample/sample-workflow.cwl' to 'file:///testing/SynapseWorkflowExample/sample-workflow.cwl'
WARNING:toil.batchSystems.singleMachine:Limiting maxCores to CPU count of system (2).
WARNING:toil.batchSystems.singleMachine:Limiting maxMemory to physically available memory (1927286784).
Traceback (most recent call last):
File "/usr/local/bin/toil-cwl-runner", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/toil/cwl/cwltoil.py", line 1220, in main
outobj = toil.start(wf1)
File "/usr/local/lib/python2.7/site-packages/toil/common.py", line 763, in start
self._batchSystem = self.createBatchSystem(self.config)
File "/usr/local/lib/python2.7/site-packages/toil/common.py", line 913, in createBatchSystem
return batchSystemClass(**kwargs)
File "/usr/local/lib/python2.7/site-packages/toil/batchSystems/singleMachine.py", line 76, in __init__
self.physicalDisk = toil.physicalDisk(config)
File "/usr/local/lib/python2.7/site-packages/toil/__init__.py", line 95, in physicalDisk
toilWorkflowDir = Toil.getWorkflowDir(config.workflowID, config.workDir)
File "/usr/local/lib/python2.7/site-packages/toil/common.py", line 1033, in getWorkflowDir
os.mkdir(workflowDir)
OSError: [Errno 13] Permission denied: '/testing/toil-a1fee1ac-a59c-4344-b00c-bc7bb289a6be-ddfd1d62f2bd08c0533c95651491cbbc'
I feel like we can make progress by understanding what's required for the simple case of reading a mounted file, e.g.:
export SHARED=/some/shared/directory # foo.txt is in the directory docker run --rm -it -v ${SHARED}:${SHARED} -w ${SHARED} ubuntu cat foo.txt
If we can clarify what is needed to make this work we can move on to bigger things.
This seems to do fine
export SHARED=/data/common/dream/deleteme
[trberg@con4 dream]$ docker run --rm -it -v ${SHARED}:${SHARED} -w ${SHARED} ubuntu cat foo.txt
This is the foo.txt file
This seems to do fine
OK, what if /data/common/dream/
is replaced with /data/users/trberg/
?
So that first one doesn't work for me
OK, what if /data/users/trberg/
is replaced with /data/common/dream/
?
I would expect any scenario involving the user's home dir, from that of a container, to fail. User home dirs are strongly protected with SELinux policies and the container is basically a different user coming in.
OK, what if
/data/common/dream/
is replaced with/data/users/trberg/
?[trberg@con4 ~]$ export SHARED=/data/users/trberg/deleteme [trberg@con4 ~]$ nano deleteme/foo.txt [trberg@con4 ~]$ docker run --rm -it -v ${SHARED}:${SHARED} -w ${SHARED} ubuntu cat foo.txt cat: foo.txt: Permission denied
So that first one doesn't work for me
OK, what if
/data/users/trberg/
is replaced with/data/common/dream/
?
When replacing with '/data/common/dream/':
INFO:cwltool:Resolved 'SynapseWorkflowExample/sample-workflow.cwl' to 'file:///testing/SynapseWorkflowExample/sample-workflow.cwl'
WARNING:toil.batchSystems.singleMachine:Limiting maxCores to CPU count of system (2).
WARNING:toil.batchSystems.singleMachine:Limiting maxMemory to physically available memory (1927286784).
Traceback (most recent call last):
File "/usr/local/bin/toil-cwl-runner", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/toil/cwl/cwltoil.py", line 1220, in main
outobj = toil.start(wf1)
File "/usr/local/lib/python2.7/site-packages/toil/common.py", line 763, in start
self._batchSystem = self.createBatchSystem(self.config)
File "/usr/local/lib/python2.7/site-packages/toil/common.py", line 913, in createBatchSystem
return batchSystemClass(**kwargs)
File "/usr/local/lib/python2.7/site-packages/toil/batchSystems/singleMachine.py", line 76, in __init__
self.physicalDisk = toil.physicalDisk(config)
File "/usr/local/lib/python2.7/site-packages/toil/__init__.py", line 95, in physicalDisk
toilWorkflowDir = Toil.getWorkflowDir(config.workflowID, config.workDir)
File "/usr/local/lib/python2.7/site-packages/toil/common.py", line 1033, in getWorkflowDir
os.mkdir(workflowDir)
OSError: [Errno 13] Permission denied: '/testing/toil-c07d2988-2381-41e2-83ce-9f015d492d8e-ddfd1d62f2bd08c0533c95651491cbbc'
The issue seems to be in the container
The last example confuses me because I see:
/testing/toil-c07d2988-2381-41e2-83ce-9f015d492d8e-ddfd1d62f2bd08c0533c95651491cbbc'
when I expected
/data/common/dream/toil-c07d2988-2381-41e2-83ce-9f015d492d8e-ddfd1d62f2bd08c0533c95651491cbbc
Did you not run the following?
export SHARED=/data/common/dream
docker run \
--rm -it \
--env="TMPDIR=${SHARED}" \
--env="TEMP=${SHARED}" \
--env="TMP=${SHARED}" \
--env="PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
--volume="${SHARED}:${SHARED}:rw" \
-w "${SHARED}" \
"sagebionetworks/synapseworkflowhook-toil" \
"toil-cwl-runner" "--defaultMemory" "100M" "--retryCount" "0" "--defaultDisk" "1000000" "--workDir" "${SHARED}" "--noLinkImports" "SynapseWorkflowExample/sample-workflow.cwl" "SynapseWorkflowExample/sample-workflow-job.yaml"
I just did a quick check in the run history and saw:
-w "/testing"
rather than -w "${SHARED}"
so that would explain that confusion.
I changed the code to use a volume
to share files between containers rather than a bind-mounted host directory: https://github.com/Sage-Bionetworks/SynapseWorkflowHook/commit/6ee130b3e094e3611d197900e6c38a0b6994f4e5
@brucehoff. What is the best practice for mounting data that a participants model would need? Should my rundocker.py make use of docker volumes
?
@brucehoff I've pulled in the new pipeline. It's throwing a new error and I can't tell if its a configuration error something else
workflow-hook_1 | [INFO] BUILD FAILURE
workflow-hook_1 | [INFO] ------------------------------------------------------------------------
workflow-hook_1 | [INFO] Total time: 6.606 s
workflow-hook_1 | [INFO] Finished at: 2019-07-03T21:20:46Z
workflow-hook_1 | [INFO] ------------------------------------------------------------------------
workflow-hook_1 | [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:java (default-cli) on project WorkflowHook: An exception occured while executing the Java class. null: InvocationTargetException: The following workflow job(s) are running but have no corresponding open Synapse submissions.
workflow-hook_1 | [ERROR] workflow_job
workflow-hook_1 | [ERROR] One way to recover is to delete the workflow job(s).
workflow-hook_1 | [ERROR] -> [Help 1]
workflow-hook_1 | [ERROR]
workflow-hook_1 | [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
workflow-hook_1 | [ERROR] Re-run Maven using the -X switch to enable full debug logging.
workflow-hook_1 | [ERROR]
workflow-hook_1 | [ERROR] For more information about the errors and possible solutions, please read the following articles:
workflow-hook_1 | [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
workflow-hook_1 | [Thread-1] INFO org.sagebionetworks.ShutdownHook - Shut down signal received.
workflow-hook_1 | [Thread-1] INFO org.sagebionetworks.ShutdownHook - Shut down complete.
compose.cli.verbose_proxy.proxy_callable: docker wait <- ('e9bac560cd863d504ff89f33f9b64143257578729081cf20a70be6fb450a903a')
compose.cli.verbose_proxy.proxy_callable: docker inspect_container <- ('e9bac560cd863d504ff89f33f9b64143257578729081cf20a70be6fb450a903a')
urllib3.connectionpool._make_request: http://localhost:None "POST /v1.25/containers/e9bac560cd863d504ff89f33f9b64143257578729081cf20a70be6fb450a903a/wait HTTP/1.1" 200 17
compose.cli.verbose_proxy.proxy_callable: docker wait -> {'StatusCode': 1}
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/containers/e9bac560cd863d504ff89f33f9b64143257578729081cf20a70be6fb450a903a/json HTTP/1.1" 200 None
workflow_workflow-hook_1 exited with code 1
compose.cli.verbose_proxy.proxy_callable: docker inspect_container -> {'AppArmorProfile': '',
'Args': ['/bin/sh',
'-c',
'exec mvn exec:java -DentryPoint=org.sagebionetworks.WorkflowHook'],
'Config': {'ArgsEscaped': True,
'AttachStderr': False,
'AttachStdin': False,
'AttachStdout': False,
'Cmd': ['/bin/sh',
'-c',
...
I tried cleaning up the submissions by removing all received and in-progress submissions but I'm still getting this error.
@trberg The error says there is a container named 'workflow_job', unexpected/unrecognized by the software. What do you see when you type docker ps -a
? Is there a stray container to remove?
@brucehoff
The message is:
com.github.dockerjava.api.exception.NotFoundException: {"message":"get shared: no such volume"}
at com.github.dockerjava.jaxrs.filter.ResponseStatusExceptionFilter.filter(ResponseStatusExceptionFilter.java:47)
at org.glassfish.jersey.client.ClientFilteringStages$ResponseFilterStage.apply(ClientFilteringStages.java:140)
at org.glassfish.jersey.client.ClientFilteringStages$ResponseFilterStage.apply(ClientFilteringStages.java:128)
at org.glassfish.jersey.process.internal.Stages.process(Stages.java:171)
at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:257)
at org.glassfish.jersey.client.JerseyInvocation$2.call(JerseyInvocation.java:701)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at org.glassfish.jersey.internal.Errors.process(Errors.java:228)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:444)
at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:697)
at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:420)
at org.glassfish.jersey.client.JerseyInvocation$Builder.get(JerseyInvocation.java:316)
at com.github.dockerjava.jaxrs.InspectVolumeCmdExec.execute(InspectVolumeCmdExec.java:27)
at com.github.dockerjava.jaxrs.InspectVolumeCmdExec.execute(InspectVolumeCmdExec.java:13)
at com.github.dockerjava.jaxrs.AbstrSyncDockerCmdExec.exec(AbstrSyncDockerCmdExec.java:23)
at com.github.dockerjava.core.command.AbstrDockerCmd.exec(AbstrDockerCmd.java:35)
at com.github.dockerjava.core.command.InspectVolumeCmdImpl.exec(InspectVolumeCmdImpl.java:40)
at com.github.dockerjava.core.command.InspectVolumeCmdImpl.exec(InspectVolumeCmdImpl.java:12)
at org.sagebionetworks.DockerUtils.getVolumeMountPoint(DockerUtils.java:155)
at org.sagebionetworks.WES.createDirInHostMountedSharedDir(WES.java:72)
at org.sagebionetworks.WES.createWorkflowJob(WES.java:175)
at org.sagebionetworks.WorkflowHook.createNewWorkflowJobs(WorkflowHook.java:256)
at org.sagebionetworks.WorkflowHook.execute(WorkflowHook.java:194)
at org.sagebionetworks.WorkflowHook.main(WorkflowHook.java:111)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)
at java.base/java.lang.Thread.run(Thread.java:834)
@brucehoff Thanks, that fixed something, but:
Workflow Administrator:
Submission failed.
The message is:
The following workflow job(s) are running but have no corresponding open Synapse submissions.
workflow_job.da1def3e-bbf1-4471-8fb9-d714860e38cd
workflow_job.35d3d4bb-6b6d-4610-aff3-a43aa682f8e6
One way to recover is to delete the workflow job(s).
Sincerely,
Workflow Administration
It is running through the workflows now, but its having a hard time linking them back to the submission.
That was a mistake on my part. I just pushed an update but it may take a while to build on DockerHub. https://github.com/Sage-Bionetworks/SynapseWorkflowHook/commit/632c75e8d04e0f47a936af392c29d80467c84d28
What is the best practice for mounting data that a participants model would need?
That is an excellent question. I would summarize the advice from https://docs.docker.com/storage/ as "volumes are good for sharing files between containers while bind mounts are good for sharing data between the host and a container" but please read the page for yourself and see if you agree. My change was to use a volume to share data between the 'hook and the Toil containers it launches, but for mounting challenge data Docker would say you should use the bind mount that you are using today. Having said this, the UW issue regarding permissions may affect accessing challenge data via a bind mount. We either have to (1) just let UW give it a try or (2) recreate their environment so we can investigate more deeply. I suggest we try (1) and see how far we get.
Sounds good, I'll give 1. A try and we'll see where that gets us.
On Thu, Jul 4, 2019, 9:44 AM Bruce Hoff notifications@github.com wrote:
What is the best practice for mounting data that a participants model would need?
That is an excellent question. I would summarize the advice from https://docs.docker.com/storage/ as "volumes are good for sharing files between containers while bind mounts are good for sharing data between the host and a container" but please read the page for yourself and see if you agree. My change was to use a volume to share data between the 'hook and the Toil containers it launches, but for mounting challenge data Docker would say you should use the bind mount that you are using today. Having said this, the UW issue regarding permissions may affect accessing challenge data via a bind mount. We either have to (1) just let UW give it a try or (2) recreate their environment so we can investigate more deeply. I suggest we try (1) and see how far we get.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sage-Bionetworks/SynapseWorkflowHook/issues/45?email_source=notifications&email_token=AESFE5Q7ZCKFFYZPIXXTGODP5YSE5A5CNFSM4HYNZQQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZH2P4Y#issuecomment-508536819, or mute the thread https://github.com/notifications/unsubscribe-auth/AESFE5RCH33D35SZJNS7CNTP5YSE5ANCNFSM4HYNZQQQ .
@brucehoff So we are running into another issue. Something looks to be trying to call localhost:None I can post the whole verbose log if need be.
workflow-hook_1 | [INFO] ------------------------------------------------------------------------
workflow-hook_1 | [INFO] BUILD FAILURE
workflow-hook_1 | [INFO] ------------------------------------------------------------------------
workflow-hook_1 | [INFO] Total time: 6.568 s
workflow-hook_1 | [INFO] Finished at: 2019-07-08T23:41:21Z
workflow-hook_1 | [INFO] ------------------------------------------------------------------------
workflow-hook_1 | [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:java (default-cli) on project WorkflowHook: An exception occured while executing the Java class. null: InvocationTargetException: NullPointerException -> [Help 1]
workflow-hook_1 | [ERROR]
workflow-hook_1 | [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
workflow-hook_1 | [ERROR] Re-run Maven using the -X switch to enable full debug logging.
workflow-hook_1 | [ERROR]
workflow-hook_1 | [ERROR] For more information about the errors and possible solutions, please read the following articles:
workflow-hook_1 | [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
workflow-hook_1 | [Thread-1] INFO org.sagebionetworks.ShutdownHook - Shut down signal received.
workflow-hook_1 | [Thread-1] INFO org.sagebionetworks.ShutdownHook - Shut down complete.
compose.cli.verbose_proxy.proxy_callable: docker inspect_container <- ('544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238')
compose.cli.verbose_proxy.proxy_callable: docker wait <- ('544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/containers/544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238/json HTTP/1.1" 200 None
urllib3.connectionpool._make_request: http://localhost:None "POST /v1.25/containers/544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238/wait HTTP/1.1" 200 17
compose.cli.verbose_proxy.proxy_callable: docker wait -> {'StatusCode': 1}
compose.cli.verbose_proxy.proxy_callable: docker inspect_container -> {'AppArmorProfile': '',
'Args': ['/bin/sh',
'-c',
'exec mvn exec:java -DentryPoint=org.sagebionetworks.WorkflowHook'],
'Config': {'ArgsEscaped': True,
'AttachStderr': False,
'AttachStdin': False,
'AttachStdout': False,
'Cmd': ['/bin/sh',
'-c',
...
workflow_workflow-hook_1 exited with code 1
I can post the whole verbose log if need be.
Yes, please. I think the relevant information appears at an earlier point in the log than in that part you included above.
compose.config.config.find: Using configuration files: ./docker-compose.yaml
WARNING: compose.config.environment.__getitem__: The DATA_UNLOCK_SYNAPSE_PRINCIPAL_ID variable is not set. Defaulting to a blank string.
docker.utils.config.find_config_file: Trying paths: ['/data/users/trberg/.docker/config.json', '/data/users/trberg/.dockercfg']
docker.utils.config.find_config_file: No config file found
docker.utils.config.find_config_file: Trying paths: ['/data/users/trberg/.docker/config.json', '/data/users/trberg/.dockercfg']
docker.utils.config.find_config_file: No config file found
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/version HTTP/1.1" 200 300
compose.cli.command.get_client: docker-compose version 1.24.0, build 0aa59064
docker-py version: 3.7.2
CPython version: 3.6.8
OpenSSL version: OpenSSL 1.1.0j 20 Nov 2018
compose.cli.command.get_client: Docker base_url: http+docker://localhost
compose.cli.command.get_client: Docker version: Version=1.13.1, ApiVersion=1.26, MinAPIVersion=1.12, GitCommit=b2f74b2/1.13.1, GoVersion=go1.10.3, Os=linux, Arch=amd64, KernelVersion=3.10.0-957.21.3.el7.x86_64, BuildTime=2019-05-01T14:55:20.271222531+00:00, PkgVersion=docker-1.13.1-96.gitb2f74b2.el7.centos.x86_64
compose.cli.verbose_proxy.proxy_callable: docker inspect_network <- ('workflow_default')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/networks/workflow_default HTTP/1.1" 200 501
compose.cli.verbose_proxy.proxy_callable: docker inspect_network -> {'Attachable': True,
'Containers': {},
'Created': '2019-07-08T10:01:14.576761314-07:00',
'Driver': 'bridge',
'EnableIPv6': False,
'IPAM': {'Config': [{'Gateway': '192.168.32.1', 'Subnet': '192.168.32.0/20'}],
'Driver': 'default',
'Options': None},
'Id': 'f4b7d112f7af82effda2e6169029414931fe0aa230bce1dc707402e6de8c0c70',
'Internal': False,
...
compose.cli.verbose_proxy.proxy_callable: docker inspect_volume <- ('workflow_shared')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/volumes/workflow_shared HTTP/1.1" 200 260
compose.cli.verbose_proxy.proxy_callable: docker inspect_volume -> {'Driver': 'local',
'Labels': {'com.docker.compose.project': 'workflow',
'com.docker.compose.version': '1.24.0',
'com.docker.compose.volume': 'shared'},
'Mountpoint': '/var/lib/docker/volumes/workflow_shared/_data',
'Name': 'workflow_shared',
'Options': {},
'Scope': 'local'}
compose.cli.verbose_proxy.proxy_callable: docker info <- ()
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/info HTTP/1.1" 200 None
compose.cli.verbose_proxy.proxy_callable: docker info -> {'Architecture': 'x86_64',
'BridgeNfIp6tables': True,
'BridgeNfIptables': True,
'CPUSet': True,
'CPUShares': True,
'CgroupDriver': 'systemd',
'ClusterAdvertise': '',
'ClusterStore': '',
'ContainerdCommit': {'Expected': 'aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1',
'ID': ''},
...
compose.cli.verbose_proxy.proxy_callable: docker inspect_network <- ('workflow_default')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/networks/workflow_default HTTP/1.1" 200 501
compose.cli.verbose_proxy.proxy_callable: docker inspect_network -> {'Attachable': True,
'Containers': {},
'Created': '2019-07-08T10:01:14.576761314-07:00',
'Driver': 'bridge',
'EnableIPv6': False,
'IPAM': {'Config': [{'Gateway': '192.168.32.1', 'Subnet': '192.168.32.0/20'}],
'Driver': 'default',
'Options': None},
'Id': 'f4b7d112f7af82effda2e6169029414931fe0aa230bce1dc707402e6de8c0c70',
'Internal': False,
...
compose.cli.verbose_proxy.proxy_callable: docker inspect_volume <- ('workflow_shared')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/volumes/workflow_shared HTTP/1.1" 200 260
compose.cli.verbose_proxy.proxy_callable: docker inspect_volume -> {'Driver': 'local',
'Labels': {'com.docker.compose.project': 'workflow',
'com.docker.compose.version': '1.24.0',
'com.docker.compose.volume': 'shared'},
'Mountpoint': '/var/lib/docker/volumes/workflow_shared/_data',
'Name': 'workflow_shared',
'Options': {},
'Scope': 'local'}
compose.cli.verbose_proxy.proxy_callable: docker inspect_volume <- ('workflow_shared')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/volumes/workflow_shared HTTP/1.1" 200 260
compose.cli.verbose_proxy.proxy_callable: docker inspect_volume -> {'Driver': 'local',
'Labels': {'com.docker.compose.project': 'workflow',
'com.docker.compose.version': '1.24.0',
'com.docker.compose.volume': 'shared'},
'Mountpoint': '/var/lib/docker/volumes/workflow_shared/_data',
'Name': 'workflow_shared',
'Options': {},
'Scope': 'local'}
compose.cli.verbose_proxy.proxy_callable: docker containers <- (all=False, filters={'label': ['com.docker.compose.project=workflow', 'com.docker.compose.oneoff=False']})
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/containers/json?limit=-1&all=0&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Dworkflow%22%2C+%22com.docker.compose.oneoff%3DFalse%22%5D%7D HTTP/1.1" 200 3
compose.cli.verbose_proxy.proxy_callable: docker containers -> (list with 0 items)
compose.cli.verbose_proxy.proxy_callable: docker containers <- (all=False, filters={'label': ['com.docker.compose.project=workflow', 'com.docker.compose.oneoff=False']})
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/containers/json?limit=-1&all=0&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Dworkflow%22%2C+%22com.docker.compose.oneoff%3DFalse%22%5D%7D HTTP/1.1" 200 3
compose.cli.verbose_proxy.proxy_callable: docker containers -> (list with 0 items)
compose.cli.verbose_proxy.proxy_callable: docker containers <- (all=True, filters={'label': ['com.docker.compose.project=workflow', 'com.docker.compose.service=workflow-hook', 'com.docker.compose.oneoff=False']})
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/containers/json?limit=-1&all=1&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Dworkflow%22%2C+%22com.docker.compose.service%3Dworkflow-hook%22%2C+%22com.docker.compose.oneoff%3DFalse%22%5D%7D HTTP/1.1" 200 3
compose.cli.verbose_proxy.proxy_callable: docker containers -> (list with 0 items)
compose.cli.verbose_proxy.proxy_callable: docker containers <- (all=True, filters={'label': ['com.docker.compose.project=workflow', 'com.docker.compose.service=workflow-hook', 'com.docker.compose.oneoff=False']})
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/containers/json?limit=-1&all=1&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Dworkflow%22%2C+%22com.docker.compose.service%3Dworkflow-hook%22%2C+%22com.docker.compose.oneoff%3DFalse%22%5D%7D HTTP/1.1" 200 3
compose.cli.verbose_proxy.proxy_callable: docker containers -> (list with 0 items)
compose.cli.verbose_proxy.proxy_callable: docker inspect_image <- ('sagebionetworks/synapseworkflowhook')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/images/sagebionetworks/synapseworkflowhook/json HTTP/1.1" 404 65
compose.service.pull: Pulling workflow-hook (sagebionetworks/synapseworkflowhook:)...
compose.cli.verbose_proxy.proxy_callable: docker pull <- ('sagebionetworks/synapseworkflowhook', tag='latest', stream=True, platform=None)
docker.auth.get_config_header: Looking for auth config
docker.auth.get_config_header: No auth config in memory - loading from filesystem
docker.utils.config.find_config_file: Trying paths: ['/data/users/trberg/.docker/config.json', '/data/users/trberg/.dockercfg']
docker.utils.config.find_config_file: No config file found
docker.auth.resolve_authconfig: Looking for auth entry for 'docker.io'
docker.auth.resolve_authconfig: No entry found
docker.auth.get_config_header: No auth config found
urllib3.connectionpool._make_request: http://localhost:None "POST /v1.25/images/create?tag=latest&fromImage=sagebionetworks%2Fsynapseworkflowhook HTTP/1.1" 200 None
compose.cli.verbose_proxy.proxy_callable: docker pull -> <generator object APIClient._stream_helper at 0x7f61d493b620>
Trying to pull repository docker.io/sagebionetworks/synapseworkflowhook ...
latest: Pulling from docker.io/sagebionetworks/synapseworkflowhook
6f2f362378c5: Pull complete
494c27a8a6b8: Pull complete
7596bb83081b: Pull complete
372744b62d49: Pull complete
fb72767f9beb: Pull complete
c34cefad6b24: Pull complete
e2b76c865b34: Pull complete
b24f4b34fb99: Pull complete
ef9e02d7f15c: Pull complete
90e566caa129: Pull complete
b91afd5fc2ef: Pull complete
7efaf569ba53: Pull complete
db36b199e15e: Pull complete
Digest: sha256:a6118850c197c664a824fc0a0c63f678199c8d2c637d7ae18cf17ab5258bd2ca
Status: Downloaded newer image for docker.io/sagebionetworks/synapseworkflowhook:latest
compose.cli.verbose_proxy.proxy_callable: docker containers <- (all=True, filters={'label': ['com.docker.compose.project=workflow', 'com.docker.compose.service=workflow-hook', 'com.docker.compose.oneoff=False']})
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/containers/json?limit=-1&all=1&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Dworkflow%22%2C+%22com.docker.compose.service%3Dworkflow-hook%22%2C+%22com.docker.compose.oneoff%3DFalse%22%5D%7D HTTP/1.1" 200 3
compose.cli.verbose_proxy.proxy_callable: docker containers -> (list with 0 items)
compose.cli.verbose_proxy.proxy_callable: docker containers <- (all=True, filters={'label': ['com.docker.compose.project=workflow', 'com.docker.compose.service=workflow-hook', 'com.docker.compose.oneoff=False']})
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/containers/json?limit=-1&all=1&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Dworkflow%22%2C+%22com.docker.compose.service%3Dworkflow-hook%22%2C+%22com.docker.compose.oneoff%3DFalse%22%5D%7D HTTP/1.1" 200 3
compose.cli.verbose_proxy.proxy_callable: docker containers -> (list with 0 items)
compose.parallel.feed_queue: Pending: {<Service: workflow-hook>}
compose.parallel.feed_queue: Starting producer thread for <Service: workflow-hook>
compose.cli.verbose_proxy.proxy_callable: docker containers <- (all=True, filters={'label': ['com.docker.compose.project=workflow', 'com.docker.compose.service=workflow-hook', 'com.docker.compose.oneoff=False']})
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/containers/json?limit=-1&all=1&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Dworkflow%22%2C+%22com.docker.compose.service%3Dworkflow-hook%22%2C+%22com.docker.compose.oneoff%3DFalse%22%5D%7D HTTP/1.1" 200 3
compose.cli.verbose_proxy.proxy_callable: docker containers -> (list with 0 items)
compose.cli.verbose_proxy.proxy_callable: docker containers <- (all=True, filters={'label': ['com.docker.compose.project=workflow', 'com.docker.compose.service=workflow-hook', 'com.docker.compose.oneoff=False']})
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/containers/json?limit=-1&all=1&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Dworkflow%22%2C+%22com.docker.compose.service%3Dworkflow-hook%22%2C+%22com.docker.compose.oneoff%3DFalse%22%5D%7D HTTP/1.1" 200 3
compose.cli.verbose_proxy.proxy_callable: docker containers -> (list with 0 items)
Creating workflow_workflow-hook_1 ...
compose.parallel.feed_queue: Pending: {ServiceName(project='workflow', service='workflow-hook', number=1)}
compose.parallel.feed_queue: Starting producer thread for ServiceName(project='workflow', service='workflow-hook', number=1)
compose.cli.verbose_proxy.proxy_callable: docker inspect_image <- ('sagebionetworks/synapseworkflowhook')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/images/sagebionetworks/synapseworkflowhook/json HTTP/1.1" 200 None
compose.cli.verbose_proxy.proxy_callable: docker inspect_image -> {'Architecture': 'amd64',
'Author': '',
'Comment': '',
'Config': {'ArgsEscaped': True,
'AttachStderr': False,
'AttachStdin': False,
'AttachStdout': False,
'Cmd': ['/bin/sh',
'-c',
'exec mvn exec:java '
...
compose.cli.verbose_proxy.proxy_callable: docker inspect_image <- ('sagebionetworks/synapseworkflowhook')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/images/sagebionetworks/synapseworkflowhook/json HTTP/1.1" 200 None
compose.cli.verbose_proxy.proxy_callable: docker inspect_image -> {'Architecture': 'amd64',
'Author': '',
'Comment': '',
'Config': {'ArgsEscaped': True,
'AttachStderr': False,
'AttachStdin': False,
'AttachStdout': False,
'Cmd': ['/bin/sh',
'-c',
'exec mvn exec:java '
...
compose.service.build_container_labels: Added config hash: ceec00858a4677a7d239352e7fb000e1cdd08d341d12ef335dbc7d5c6a3b0748
compose.cli.verbose_proxy.proxy_callable: docker create_host_config <- (links=[], port_bindings={}, binds=['/data/common/dream/data/scratch:/tempDir:rw', '/var/run/docker.sock:/var/run/docker.sock:rw', 'workflow_shared:/shared:rw'], volumes_from=[], privileged=True, network_mode='workflow_default', devices=None, dns=None, dns_opt=None, dns_search=None, restart_policy=None, runtime=None, cap_add=None, cap_drop=None, mem_limit=None, mem_reservation=None, memswap_limit=None, ulimits=None, log_config={'Type': '', 'Config': {}}, extra_hosts=None, read_only=None, pid_mode=None, security_opt=None, ipc_mode=None, cgroup_parent=None, cpu_quota=None, shm_size=None, sysctls=None, pids_limit=None, tmpfs=[], oom_kill_disable=None, oom_score_adj=None, mem_swappiness=None, group_add=None, userns_mode=None, init=None, init_path=None, isolation=None, cpu_count=None, cpu_percent=None, nano_cpus=None, volume_driver=None, cpuset_cpus=None, cpu_shares=None, storage_opt=None, blkio_weight=None, blkio_weight_device=None, device_read_bps=None, device_read_iops=None, device_write_bps=None, device_write_iops=None, mounts=None, device_cgroup_rules=None, cpu_period=None, cpu_rt_period=None, cpu_rt_runtime=None)
compose.cli.verbose_proxy.proxy_callable: docker create_host_config -> {'Binds': ['/data/common/dream/data/scratch:/tempDir:rw',
'/var/run/docker.sock:/var/run/docker.sock:rw',
'workflow_shared:/shared:rw'],
'Links': [],
'LogConfig': {'Config': {}, 'Type': ''},
'NetworkMode': 'workflow_default',
'PortBindings': {},
'Privileged': True,
'VolumesFrom': []}
compose.cli.verbose_proxy.proxy_callable: docker create_container <- (environment=['DOCKER_ENGINE_URL=unix:///var/run/docker.sock', 'SYNAPSE_USERNAME=trberg', 'SYNAPSE_PASSWORD=TheBigCheese', 'WORKFLOW_OUTPUT_ROOT_ENTITY_ID=syn18420542', 'EVALUATION_TEMPLATES={"9614227":"syn18838336"}', 'NOTIFICATION_PRINCIPAL_ID=3386536', 'SHARE_RESULTS_IMMEDIATELY=false', 'DATA_UNLOCK_SYNAPSE_PRINCIPAL_ID=', 'TOIL_CLI_OPTIONS=--defaultMemory 10G --retryCount 0 --defaultDisk 50G', 'MAX_CONCURRENT_WORKFLOWS=1', 'SUBMITTER_NOTIFICATION_MASK=31'], image='sagebionetworks/synapseworkflowhook', volumes={'/tempDir': {}, '/var/run/docker.sock': {}, '/shared': {}}, name='workflow_workflow-hook_1', detach=True, labels={'com.docker.compose.project': 'workflow', 'com.docker.compose.service': 'workflow-hook', 'com.docker.compose.oneoff': 'False', 'com.docker.compose.container-number': '1', 'com.docker.compose.version': '1.24.0', 'com.docker.compose.config-hash': 'ceec00858a4677a7d239352e7fb000e1cdd08d341d12ef335dbc7d5c6a3b0748'}, host_config={'Privileged': True, 'NetworkMode': 'workflow_default', 'VolumesFrom': [], 'Binds': ['/data/common/dream/data/scratch:/tempDir:rw', '/var/run/docker.sock:/var/run/docker.sock:rw', 'workflow_shared:/shared:rw'], 'PortBindings': {}, 'Links': [], 'LogConfig': {'Type': '', 'Config': {}}}, networking_config={'EndpointsConfig': {'workflow_default': {'Aliases': ['workflow-hook'], 'IPAMConfig': {}}}})
compose.parallel.feed_queue: Pending: set()
compose.parallel.feed_queue: Pending: set()
compose.parallel.feed_queue: Pending: set()
compose.parallel.feed_queue: Pending: set()
urllib3.connectionpool._make_request: http://localhost:None "POST /v1.25/containers/create?name=workflow_workflow-hook_1 HTTP/1.1" 201 90
compose.cli.verbose_proxy.proxy_callable: docker create_container -> {'Id': '544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238',
'Warnings': None}
compose.cli.verbose_proxy.proxy_callable: docker inspect_container <- ('544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/containers/544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238/json HTTP/1.1" 200 None
compose.cli.verbose_proxy.proxy_callable: docker inspect_container -> {'AppArmorProfile': '',
'Args': ['/bin/sh',
'-c',
'exec mvn exec:java -DentryPoint=org.sagebionetworks.WorkflowHook'],
'Config': {'ArgsEscaped': True,
'AttachStderr': False,
'AttachStdin': False,
'AttachStdout': False,
'Cmd': ['/bin/sh',
'-c',
...
compose.cli.verbose_proxy.proxy_callable: docker attach <- ('544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238', stdout=True, stderr=True, stream=True)
urllib3.connectionpool._make_request: http://localhost:None "POST /v1.25/containers/544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238/attach?logs=0&stdout=1&stderr=1&stream=1 HTTP/1.1" 101 0
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/containers/544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238/json HTTP/1.1" 200 None
compose.cli.verbose_proxy.proxy_callable: docker attach -> <docker.types.daemon.CancellableStream object at 0x7f61d4906748>
compose.cli.verbose_proxy.proxy_callable: docker disconnect_container_from_network <- ('544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238', 'workflow_default')
urllib3.connectionpool._make_request: http://localhost:None "POST /v1.25/networks/workflow_default/disconnect HTTP/1.1" 200 0
compose.cli.verbose_proxy.proxy_callable: docker disconnect_container_from_network -> None
compose.cli.verbose_proxy.proxy_callable: docker connect_container_to_network <- ('544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238', 'workflow_default', aliases=['544462918fcf', 'workflow-hook'], ipv4_address=None, ipv6_address=None, links=[], link_local_ips=None)
compose.parallel.feed_queue: Pending: set()
urllib3.connectionpool._make_request: http://localhost:None "POST /v1.25/networks/workflow_default/connect HTTP/1.1" 200 0
compose.cli.verbose_proxy.proxy_callable: docker connect_container_to_network -> None
compose.cli.verbose_proxy.proxy_callable: docker start <- ('544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238')
compose.parallel.feed_queue: Pending: set()
compose.parallel.feed_queue: Pending: set()
compose.parallel.feed_queue: Pending: set()
compose.parallel.feed_queue: Pending: set()
compose.parallel.feed_queue: Pending: set()
compose.parallel.feed_queue: Pending: set()
compose.parallel.feed_queue: Pending: set()
compose.parallel.feed_queue: Pending: set()
compose.parallel.feed_queue: Pending: set()
compose.parallel.feed_queue: Pending: set()
compose.parallel.feed_queue: Pending: set()
urllib3.connectionpool._make_request: http://localhost:None "POST /v1.25/containers/544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238/start HTTP/1.1" 204 0
compose.cli.verbose_proxy.proxy_callable: docker start -> None
Creating workflow_workflow-hook_1 ... done
compose.parallel.feed_queue: Pending: set()
compose.parallel.parallel_execute_iter: Finished processing: <Service: workflow-hook>
compose.parallel.feed_queue: Pending: set()
Attaching to workflow_workflow-hook_1
compose.cli.verbose_proxy.proxy_callable: docker events <- (filters={'label': ['com.docker.compose.project=workflow', 'com.docker.compose.oneoff=False']}, decode=True)
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/events?filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Dworkflow%22%2C+%22com.docker.compose.oneoff%3DFalse%22%5D%7D HTTP/1.1" 200 None
compose.cli.verbose_proxy.proxy_callable: docker events -> <docker.types.daemon.CancellableStream object at 0x7f61d4906668>
workflow-hook_1 | [INFO] Scanning for projects...
workflow-hook_1 | Downloading from central: http://repo1.maven.org/maven2/org/codehaus/mojo/exec-maven-plugin/1.2.1/exec-maven-plugin-1.2.1.pom
Downloaded from central: http://repo1.maven.org/maven2/org/codehaus/mojo/exec-maven-plugin/1.2.1/exec-maven-plugin-1.2.1.pom (7.7 kB at 59 kB/s)
workflow-hook_1 | Downloading from central: http://repo1.maven.org/maven2/org/codehaus/mojo/mojo-parent/28/mojo-parent-28.pom
Downloaded from central: http://repo1.maven.org/maven2/org/codehaus/mojo/mojo-parent/28/mojo-parent-28.pom (26 kB at 1.1 MB/s)
workflow-hook_1 | Downloading from central: http://repo1.maven.org/maven2/org/codehaus/codehaus-parent/3/codehaus-parent-3.pom
Downloaded from central: http://repo1.maven.org/maven2/org/codehaus/codehaus-parent/3/codehaus-parent-3.pom (4.1 kB at 216 kB/s)
workflow-hook_1 | Downloading from central: http://repo1.maven.org/maven2/org/codehaus/mojo/exec-maven-plugin/1.2.1/exec-maven-plugin-1.2.1.jar
Downloaded from central: http://repo1.maven.org/maven2/org/codehaus/mojo/exec-maven-plugin/1.2.1/exec-maven-plugin-1.2.1.jar (38 kB at 1.9 MB/s)
workflow-hook_1 | [INFO]
workflow-hook_1 | [INFO] ------------------< org.sagebionetworks:WorkflowHook >------------------
workflow-hook_1 | [INFO] Building WorkflowHook 1.0-SNAPSHOT
workflow-hook_1 | [INFO] --------------------------------[ jar ]---------------------------------
workflow-hook_1 | [INFO]
workflow-hook_1 | [INFO] >>> exec-maven-plugin:1.2.1:java (default-cli) > validate @ WorkflowHook >>>
workflow-hook_1 | [INFO]
workflow-hook_1 | [INFO] <<< exec-maven-plugin:1.2.1:java (default-cli) < validate @ WorkflowHook <<<
workflow-hook_1 | [INFO]
workflow-hook_1 | [INFO]
workflow-hook_1 | [INFO] --- exec-maven-plugin:1.2.1:java (default-cli) @ WorkflowHook ---
workflow-hook_1 | Downloading from central: http://repo1.maven.org/maven2/org/apache/maven/maven-plugin-api/2.0/maven-plugin-api-2.0.pom
Downloaded from central: http://repo1.maven.org/maven2/org/apache/maven/maven-plugin-api/2.0/maven-plugin-api-2.0.pom (601 B at 55 kB/s)
workflow-hook_1 | Downloading from central: http://repo1.maven.org/maven2/org/apache/maven/maven/2.0/maven-2.0.pom
Downloaded from central: http://repo1.maven.org/maven2/org/apache/maven/maven/2.0/maven-2.0.pom (8.8 kB at 674 kB/s)
workflow-hook_1 | Downloading from central: http://repo1.maven.org/maven2/org/apache/commons/commons-exec/1.1/commons-exec-1.1.pom
Downloaded from central: http://repo1.maven.org/maven2/org/apache/commons/commons-exec/1.1/commons-exec-1.1.pom (11 kB at 801 kB/s)
workflow-hook_1 | Downloading from central: http://repo1.maven.org/maven2/org/codehaus/plexus/plexus-container-default/1.0-alpha-9/plexus-container-default-1.0-alpha-9.jar
workflow-hook_1 | Downloading from central: http://repo1.maven.org/maven2/org/apache/commons/commons-exec/1.1/commons-exec-1.1.jar
Downloaded from central: http://repo1.maven.org/maven2/org/apache/commons/commons-exec/1.1/commons-exec-1.1.jar (53 kB at 761 kB/s)
workflow-hook_1 | Downloaded from central: http://repo1.maven.org/maven2/org/codehaus/plexus/plexus-container-default/1.0-alpha-9/plexus-container-default-1.0-alpha-9.jar (195 kB at 2.3 MB/s)
workflow-hook_1 | ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
workflow-hook_1 | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - Precheck completed successfully.
workflow-hook_1 | [WARNING]
workflow-hook_1 | java.lang.reflect.InvocationTargetException
workflow-hook_1 | at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
workflow-hook_1 | at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
workflow-hook_1 | at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
workflow-hook_1 | at java.lang.reflect.Method.invoke (Method.java:566)
workflow-hook_1 | at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:297)
workflow-hook_1 | at java.lang.Thread.run (Thread.java:834)
workflow-hook_1 | Caused by: java.lang.NullPointerException
workflow-hook_1 | at org.sagebionetworks.WorkflowHook.getWorkflowURLAndEntrypoint (WorkflowHook.java:170)
workflow-hook_1 | at org.sagebionetworks.WorkflowHook.execute (WorkflowHook.java:188)
workflow-hook_1 | at org.sagebionetworks.WorkflowHook.main (WorkflowHook.java:111)
workflow-hook_1 | at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
workflow-hook_1 | at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
workflow-hook_1 | at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
workflow-hook_1 | at java.lang.reflect.Method.invoke (Method.java:566)
workflow-hook_1 | at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:297)
workflow-hook_1 | at java.lang.Thread.run (Thread.java:834)
workflow-hook_1 | [INFO] ------------------------------------------------------------------------
workflow-hook_1 | [INFO] BUILD FAILURE
workflow-hook_1 | [INFO] ------------------------------------------------------------------------
workflow-hook_1 | [INFO] Total time: 6.568 s
workflow-hook_1 | [INFO] Finished at: 2019-07-08T23:41:21Z
workflow-hook_1 | [INFO] ------------------------------------------------------------------------
workflow-hook_1 | [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:java (default-cli) on project WorkflowHook: An exception occured while executing the Java class. null: InvocationTargetException: NullPointerException -> [Help 1]
workflow-hook_1 | [ERROR]
workflow-hook_1 | [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
workflow-hook_1 | [ERROR] Re-run Maven using the -X switch to enable full debug logging.
workflow-hook_1 | [ERROR]
workflow-hook_1 | [ERROR] For more information about the errors and possible solutions, please read the following articles:
workflow-hook_1 | [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
workflow-hook_1 | [Thread-1] INFO org.sagebionetworks.ShutdownHook - Shut down signal received.
workflow-hook_1 | [Thread-1] INFO org.sagebionetworks.ShutdownHook - Shut down complete.
compose.cli.verbose_proxy.proxy_callable: docker inspect_container <- ('544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238')
compose.cli.verbose_proxy.proxy_callable: docker wait <- ('544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/containers/544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238/json HTTP/1.1" 200 None
urllib3.connectionpool._make_request: http://localhost:None "POST /v1.25/containers/544462918fcfb7449dd50b51cd5fee3719dc4af72cc58a24b4d1babd83b26238/wait HTTP/1.1" 200 17
compose.cli.verbose_proxy.proxy_callable: docker wait -> {'StatusCode': 1}
compose.cli.verbose_proxy.proxy_callable: docker inspect_container -> {'AppArmorProfile': '',
'Args': ['/bin/sh',
'-c',
'exec mvn exec:java -DentryPoint=org.sagebionetworks.WorkflowHook'],
'Config': {'ArgsEscaped': True,
'AttachStderr': False,
'AttachStdin': False,
'AttachStdout': False,
'Cmd': ['/bin/sh',
'-c',
...
workflow_workflow-hook_1 exited with code 1
@trberg The error reporting is poor (and I will correct that) but the problem is with your configuration. Will you please tell me what's on the line in your .env
file starting with:
EVALUATION_TEMPLATES=
?
Will you please tell me what's on the line in your
.env
file starting with:EVALUATION_TEMPLATES=
?
DOCKER_ENGINE_URL=unix:///var/run/docker.sock
WORKFLOW_OUTPUT_ROOT_ENTITY_ID=syn18420542
EVALUATION_TEMPLATES={"9614227":"syn18838336"}
TOIL_CLI_OPTIONS=--defaultMemory 10G --retryCount 0 --defaultDisk 50G
SHARE_RESULTS_IMMEDIATELY=false
MAX_CONCURRENT_WORKFLOWS=1
NOTIFICATION_PRINCIPAL_ID=3386536
SUBMITTER_NOTIFICATION_MASK=31
WORKFLOW_TEMPDIR=/data/common/dream/data/scratch
Go to https://www.synapse.org/#!Synapse:syn18838336 Select File Tools > Annotations (at right) Click the Edit button (at left) Under "Key" enter "ROOT_TEMPLATE" Under "Value" enter the relative path (within in the linked .zip file) to the root .cwl file for your workflow. (Looks like it should be "EHR-challenge-master/docker_agent_workflow.cwl") Save your changes and try again.
Please let us know if this resolves the problem.
I added the Key/Value from above. The pipeline is no longer breaking and is entering the Top level loop, but when I submit a docker, it gets pulled in and then fails. My email gives this error:
ool_uri
STDERR: 2019-07-09T18:19:30.633214539Z raise ValidationException("Not found: '%s'" % argsworkflow)
STDERR: 2019-07-09T18:19:30.633277279Z schema_salad.validate.ValidationException: Not found: 'EHR-challenge-master/docker_agent_workflow.cwl'
The reported admin logs from the pipeline include the following.
STDERR: 2019-07-09T18:40:50.148114543Z Traceback (most recent call last):
STDERR: 2019-07-09T18:40:50.148186078Z File "/usr/local/bin/toil-cwl-runner", line 10, in <module>
STDERR: 2019-07-09T18:40:50.148197971Z sys.exit(main())
STDERR: 2019-07-09T18:40:50.148205189Z File "/usr/local/lib/python2.7/site-packages/toil/cwl/cwltoil.py", line 1200, in main
STDERR: 2019-07-09T18:40:50.149048354Z loading_context.fetcher_constructor)
STDERR: 2019-07-09T18:40:50.149086558Z File "/usr/local/lib/python2.7/site-packages/cwltool/load_tool.py", line 89, in resolve_tool_uri
STDERR: 2019-07-09T18:40:50.149339991Z raise ValidationException("Not found: '%s'" % argsworkflow)
STDERR: 2019-07-09T18:40:50.149354784Z schema_salad.validate.ValidationException: Not found: 'EHR-challenge-master/docker_agent_workflow.cwl'
Trying to reproduce the problem:
1) I set up a submission queue with UW's workflow:
docker run --rm -it -e SYNAPSE_USERNAME=RClientTestUser -e SYNAPSE_PASSWORD=xxxxxxxx \
-e WORKFLOW_TEMPLATE_URL=https://github.com/Sage-Bionetworks/EHR-challenge/archive/master.zip \
-e ROOT_TEMPLATE=EHR-challenge-master/docker_agent_workflow.cwl sagebionetworks/synapseworkflowhook /set_up.sh
2) I updated my .env
file
DOCKER_ENGINE_URL=unix:///var/run/docker.sock
SYNAPSE_USERNAME=RClientTestUser
SYNAPSE_PASSWORD=xxxxxxxx
WORKFLOW_OUTPUT_ROOT_ENTITY_ID=syn20459151
TOIL_CLI_OPTIONS=--defaultMemory 100M --retryCount 0 --defaultDisk 1000000
EVALUATION_TEMPLATES={"9614270":"syn20459152"}
MAX_CONCURRENT_WORKFLOWS=2
SUBMITTER_NOTIFICATION_MASK=28
3) docker-compose up
4) Submit something to the queue. (I used a random text file and I expect the workflow to fail on it.)
Result:
Here's the Toil command from the Docker Compose logs:
...
workflow-hook_1 | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WES - toil cmd: [toil-cwl-runner, --defaultMemory, 100M, --retryCount, 0, --defaultDisk, 1000000, --workDir, /var/lib/docker/volumes/synapseworkflowhook_shared/_data/a3c50826-861f-4c04-aac9-c2414dc46a97, --noLinkImports, EHR-challenge-master/docker_agent_workflow.cwl, /var/lib/docker/volumes/synapseworkflowhook_shared/_data/a3c50826-861f-4c04-aac9-c2414dc46a97/TMP10272894412962073903.yaml]
...
I received this error by email:
Your workflow job, (submission ID 9686794), has failed to complete. The message is:
-packages/toil/leader.py", line 246, in run
STDERR: 2019-07-09T19:18:19.508910600Z raise FailedJobsException(self.config.jobStore, self.toilState.totalFailedJobs, self.jobStore)
STDERR: 2019-07-09T19:18:19.508972300Z toil.leader.FailedJobsException
I looked at the logs and my pipeline seemed to fail when trying to download the 'gold standard':
STDERR: 2019-07-09T19:18:12.954042000Z WARNING:toil.leader:The job seems to have left a log file, indicating failure: 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.3/download_from_synapse.cwl' python w/b/jobshzUPL
STDERR: 2019-07-09T19:18:12.954132600Z WARNING:toil.leader:w/b/jobshzUPL INFO:toil.worker:---TOIL WORKER OUTPUT LOG---
STDERR: 2019-07-09T19:18:12.954159200Z WARNING:toil.leader:w/b/jobshzUPL INFO:toil:Running Toil version 3.20.0-cf34ca3416697f2abc816b2538f20ee29ba16932.
STDERR: 2019-07-09T19:18:12.954410800Z WARNING:toil.leader:w/b/jobshzUPL [job download_from_synapse.cwl] /var/lib/docker/volumes/synapseworkflowhook_shared/_data/a3c50826-861f-4c04-aac9-c2414dc46a97/tmpNbrL_7/f/c/out_tmpdirBiM5d7$ python \
STDERR: 2019-07-09T19:18:12.954552900Z WARNING:toil.leader:w/b/jobshzUPL download_synapse_file.py \
STDERR: 2019-07-09T19:18:12.954579600Z WARNING:toil.leader:w/b/jobshzUPL -s \
STDERR: 2019-07-09T19:18:12.954663400Z WARNING:toil.leader:w/b/jobshzUPL syn19005683 \
STDERR: 2019-07-09T19:18:12.954898900Z WARNING:toil.leader:w/b/jobshzUPL -c \
I think I got past the point where things stopped for @trberg. In his case Toil didn't even find the root .cwl file. We need to figure out why the 'Toil' container fails to see files written to a shared volume by the SynapseWorkflowHook container.
@trberg Can you see the downloaded workflow files still sitting on the shared volume, like this?
$ docker volume ls
DRIVER VOLUME NAME
local synapseworkflowhook_shared
$ docker run -it --rm -v synapseworkflowhook_shared:/foo ubuntu
root@73bb63241e9b:/# ls -atl /foo
total 24
drwxr-xr-x 1 root root 4096 Jul 9 19:35 ..
drwxr-xr-x 5 root root 4096 Jul 9 19:18 a3c50826-861f-4c04-aac9-c2414dc46a97
drwxr-xr-x 6 root root 4096 Jul 9 19:17 .
drwxr-xr-x 5 root root 4096 Jul 5 00:33 92163388-86e9-42a3-8bc1-704d6cff4d5b
drwxr-xr-x 5 root root 4096 Jul 5 00:33 c03389fe-0b36-4a5b-bac4-9817b18bc40c
drwxr-xr-x 3 root root 4096 Jul 5 00:30 30e3e971-b02c-4369-a113-109d916dfbba
root@73bb63241e9b:/# tail /foo/a3c50826-861f-4c04-aac9-c2414dc46a97/EHR-challenge-master/docker_agent_workflow.cwl
- id: annotation_values
source: "#scoring/results"
- id: to_public
valueFrom: "true"
- id: force_change_annotation_acl
valueFrom: "true"
- id: synapse_config
source: "#synapseConfig"
out: []
root@73bb63241e9b:/# exit
So EHR-challenge-master doesnt seem to be uploading at all
[trberg@con4 dream]$ docker volume ls
DRIVER VOLUME NAME
local synapseworkflowhook_shared
local workflow_shared
local workflow_shared_workspace
[trberg@con4 dream]$ docker run -it --rm -v synapseworkflowhook_shared:/foo ubuntu
Unable to find image 'ubuntu:latest' locally
Trying to pull repository docker.io/library/ubuntu ...
latest: Pulling from docker.io/library/ubuntu
5b7339215d1d: Pull complete
14ca88e9f672: Pull complete
a31c3b1caad4: Pull complete
b054a26005b7: Pull complete
Digest: sha256:9b1702dcfe32c873a770a32cfd306dd7fc1c4fd134adfb783db68defc8894b3c
Status: Downloaded newer image for docker.io/ubuntu:latest
root@0e9b5e314a21:/# ls -atl /foo
total 44
drwxr-xr-x. 22 root root 253 Jul 9 19:43 ..
drwxr-xr-x. 12 root root 4096 Jul 9 19:39 .
drwxr-xr-x. 2 root root 4096 Jul 9 19:39 3208a5b3-a608-452e-b677-b9b0d2634a4f
drwxr-xr-x. 2 root root 4096 Jul 9 18:56 297a1224-d814-4268-9ce8-8d5a7862ff80
drwxr-xr-x. 2 root root 4096 Jul 9 18:48 cf9937a2-fa9c-4774-bf7d-5c9c25765638
drwxr-xr-x. 2 root root 4096 Jul 9 18:40 cc5a1ff1-9382-47d8-8d35-3739297a9a30
drwxr-xr-x. 2 root root 4096 Jul 9 18:33 9ca0099b-e45c-4df9-a66e-e5466f563488
drwxr-xr-x. 2 root root 4096 Jul 9 18:24 21085891-c762-427e-a2e0-e484ae5ee7d0
drwxr-xr-x. 2 root root 4096 Jul 9 18:19 9e053d76-f829-429f-b291-00443880e69c
drwxr-xr-x. 2 root root 4096 Jul 9 17:53 9c5278f8-0eb0-40cf-8275-de21baaafe98
drwxr-xr-x. 2 root root 4096 Jul 8 22:44 7e7daf9c-a046-4048-ad31-84a30ae87a4e
drwxr-xr-x. 2 root root 4096 Jul 8 22:26 6877ac42-ff77-42f7-972e-da54fb0e0631
root@0e9b5e314a21:/# ls /foo/3208a5b3-a608-452e-b677-b9b0d2634a4f/
root@0e9b5e314a21:/# ls /foo/297a1224-d814-4268-9ce8-8d5a7862ff80/
root@0e9b5e314a21:/# ls /foo/cf9937a2-fa9c-4774-bf7d-5c9c25765638/
root@0e9b5e314a21:/# ls /foo/6877ac42-ff77-42f7-972e-da54fb0e0631/
I tried listing the contents of a couple of the folders and they seem to be empty
However, the volume workflow_shared seems to have it:
[trberg@con4 dream]$ docker run -it --rm -v workflow_shared:/foo ubuntu
root@15a01d5d6688:/# ls /foo/
21085891-c762-427e-a2e0-e484ae5ee7d0 6877ac42-ff77-42f7-972e-da54fb0e0631 9ca0099b-e45c-4df9-a66e-e5466f563488 cf9937a2-fa9c-4774-bf7d-5c9c25765638
297a1224-d814-4268-9ce8-8d5a7862ff80 7e7daf9c-a046-4048-ad31-84a30ae87a4e 9e053d76-f829-429f-b291-00443880e69c
3208a5b3-a608-452e-b677-b9b0d2634a4f 9c5278f8-0eb0-40cf-8275-de21baaafe98 cc5a1ff1-9382-47d8-8d35-3739297a9a30
root@15a01d5d6688:/# ls /foo/21085891-c762-427e-a2e0-e484ae5ee7d0/
EHR-challenge-master TMP6681266748677776204.yaml
root@15a01d5d6688:/# ls /foo/6877ac42-ff77-42f7-972e-da54fb0e0631/
EHR-challenge-master TMP3276369643428517427.yaml
root@15a01d5d6688:/# tail /foo/21085891-c762-427e-a2e0-e484ae5ee7d0/EHR-challenge-master/docker_agent_workflow.cwl
- id: annotation_values
source: "#scoring/results"
- id: to_public
valueFrom: "true"
- id: force_change_annotation_acl
valueFrom: "true"
- id: synapse_config
source: "#synapseConfig"
out: []
I just did a clean restart, I pruned the containers, images, and volumes. Now the pipeline is looking for synapseworkflowhook_shared but it doesn't seem to be creating it
workflow-hook_1 | Caused by: com.github.dockerjava.api.exception.NotFoundException: {"message":"get synapseworkflowhook_shared: no such volume"}
Upon further inspection, there is no volume synapseworkflowhook_shared
[trberg@con4 workflow]$ docker volume ls
DRIVER VOLUME NAME
local workflow_shared
Before I ran docker-compose up all volumes had been pruned.
Ahh, Docker Compose is behaving differently for you than for me. Note that docker-compose.yaml
creates a volume called shared
. In my environment it appears as synapseworkflowhook_shared
while for you it's workflow_shared
.
... and here's the explanation: https://forums.docker.com/t/docker-compose-prepends-directory-name-to-named-volumes/32835
@trberg as a workaround, please try:
docker-compose down
docker-compose -p synapseworkflowhook up
@trberg as a workaround, please try:
docker-compose down docker-compose -p synapseworkflowhook up
Before I saw this, I just renamed the workflow directory name to synapseworkflowhook and that seems to have worked to get us to the next step.
Now we are back to where we started with permission errors
STDERR: 2019-07-09T21:31:28.332647765Z Traceback (most recent call last):
STDERR: 2019-07-09T21:31:28.332740532Z File "/usr/local/bin/toil-cwl-runner", line 10, in <module>
STDERR: 2019-07-09T21:31:28.332748998Z sys.exit(main())
STDERR: 2019-07-09T21:31:28.332753123Z File "/usr/local/lib/python2.7/site-packages/toil/cwl/cwltoil.py", line 1200, in main
STDERR: 2019-07-09T21:31:28.333421847Z loading_context.fetcher_constructor)
STDERR: 2019-07-09T21:31:28.333438550Z File "/usr/local/lib/python2.7/site-packages/cwltool/load_tool.py", line 86, in resolve_tool_uri
STDERR: 2019-07-09T21:31:28.333728076Z uri = resolver(document_loader, argsworkflow)
STDERR: 2019-07-09T21:31:28.333739660Z File "/usr/local/lib/python2.7/site-packages/cwltool/resolver.py", line 44, in tool_resolver
STDERR: 2019-07-09T21:31:28.333988311Z ret = r(document_loader, uri)
STDERR: 2019-07-09T21:31:28.333997782Z File "/usr/local/lib/python2.7/site-packages/cwltool/resolver.py", line 21, in resolve_local
STDERR: 2019-07-09T21:31:28.334002003Z if pathobj.is_file():
STDERR: 2019-07-09T21:31:28.334005855Z File "/usr/local/lib/python2.7/site-packages/pathlib2/__init__.py", line 1575, in is_file
STDERR: 2019-07-09T21:31:28.334651425Z return S_ISREG(self.stat().st_mode)
STDERR: 2019-07-09T21:31:28.334679331Z File "/usr/local/lib/python2.7/site-packages/pathlib2/__init__.py", line 1356, in stat
STDERR: 2019-07-09T21:31:28.334798841Z return self._accessor.stat(self)
STDERR: 2019-07-09T21:31:28.334813339Z File "/usr/local/lib/python2.7/site-packages/pathlib2/__init__.py", line 541, in wrapped
STDERR: 2019-07-09T21:31:28.334895654Z return strfunc(str(pathobj), *args)
STDERR: 2019-07-09T21:31:28.334908559Z OSError: [Errno 13] Permission denied: '/var/lib/docker/volumes/synapseworkflowhook_shared/_data/182337f7-8533-4f42-8ecc-e4a5e3a3b3cc/EHR-challenge-master/docker_agent_workflow.cwl'
What seems to be happening is that the workflow is creating directories in the container that it later cannot access. It may have something to do with us running the workflow in a privileged state. When I remove the privileged state, the container can't access the docker socket.
@trberg Your idea sounds right, but something doesn't quite make sense: When you tried this experiment: https://github.com/Sage-Bionetworks/SynapseWorkflowHook/issues/45#issuecomment-509783750 , weren't you mounting the volume to a non-privileged ubuntu container (which was able to access the files)? Was the volume in that experiment (workflow_shared
) created by running Docker Compose in a privileged or non-privileged state?
So we are running into an issue where the command "docker-compose --verbose up" runs into a permissions issue, even when running as sudo:
We find we can bypass this error by running the docker-compose in a privileged state. However, we then run into an other permission error further down the CWL pipeline when trying to pull in docker containers.
We are using Redhat (which doesn't support docker-compose) for our OS and are running docker version 1.13.1.
Our reference evaluation pipeline is located here: https://github.com/Sage-Bionetworks/EHR-challenge and is correctly being pulled into the running pipeline.
We had this pipeline up and running at one point but had to restart the VM and now it's broken. The restart updated the OS and docker version but didn't radically change anything.
Any insight would be helpful to troubleshoot this issue.
Thank you