Open trberg opened 5 years ago
The workflow_shared volume was created in a privileged state and yet the non-privileged ubuntu container was able to access it, that is correct.
I'm not sure what to do with that....
There's a bit of confusion here on the words too, when you say "workflow is creating directories in the container" - does that mean creating files in the mounted volume, from within a running container?
If you're creating files via the user running compose (trberg), that is going to run into problems.
Is there any way, in your environment, to do the following:
? Or is that simply not possible?
If it's not possible then we can stop trying to get it to work and start pursuing other alternatives.
Volumes should make that possible, I would expect running containers to be able to share volumes.
Putting aside the challenge and the workflow hook, can you create a working demo with two generic containers and a volume through which they share data? Again, one container must be able to access the Docker Engine.
Ok, I'm not sure the best way to show this but I just ran through a simple test using a volume between a number of containers which worked just fine. Maybe I'll just toss in my terminal output here. Let me know if I should expand on this.
@jprosser I read through term.txt
but did not see your demonstration that the container was able to access the Docker engine. Could you please explain how you demonstrated this?
Here's a run with a privileged container testing and then also with a user set to "bob" to show that scenario. So user root is id=0 which is the same everywhere but if you toss in a user, that will have an id of something else. If you want to share files, this id needs to match if not user root (id=0) and the permissions also need to allow as well. Hope this helps! priv.txt
I'm sure you'll note but just to call it out there are --privileged flags tossed in there too in the various runs and I hurried through looking for anything unexpected as I went, but did not notice anything amis seeing everything go as I expected based on unix permission, no selinux denies since I didn't use my home dir for anything but building an image.
@jprosser Thanks for doing this suite of tests. Do the results suggest what you can change when running the workflow hook to allow file sharing between containers to work? If not, how shall we proceed to investigate the issue?
Yes, so first of all, using root as the user within the container will make things simpler when using one volume with multiple containers. The downside to this is the user/dev (this is a term I'm giving to the user login of the developer here) who runs docker is not root (in our environment) and doesn't have access to that volume(s) directly but can build images and run containers that do have access, so for the user/dev this is something to be aware of.
A Dockerfile that copies data from the user/dev homedir|cwd|somepath into the image is fine during docker build, but that is of course read only being an image in the end. Adding a volume gives a persistent area to write to and share between containers provided that all the activity is user root (id=0). Otherwise user and permission management must be handled directly.
For the user/dev to copy data into that volume, the "docker cp" command can be used, but this requires a running container with that volume, to make the operation possible as far as I know.
Also, I haven't actually looked at this project yet and have just been helping Tim out. I hope to check it out and perhaps offer some suggestion or PRs if time allows and you have interest.
@jprosser could we try and figure out a way to run the docker-compose in a non-privileged state? I believe that would involve expanding the permissions on docker.sock but I'm not sure.
If there's a need for a container to orchestrate, then I believe that privileged would be required, basically running docker in docker.
If we're just hung-up on getting data from the dev/user file system space into the container world, that could be solved with a Dockerfile that creates a data container which carries the data directly copied in during docker build. Or perhaps better is using that container as the copy tool into and out of a volume shared among various containers here.
running docker in docker
Commonly the phrase "running docker in docker" means running the docker Engine in a docker container, a practice that's advised against. Here we are merely running the Docker client in a container (the Docker Engine runs on the host), which is generally acceptable.
If we're just hung-up on getting data from the dev/user file system space into the container world ...
The error we are addressing is not related to moving data files into a container volume but rather sharing files between containers via a volume.
using root as the user within the container will make things simpler when using one volume with multiple containers
Yes, that is clear from the results of your recent experiments. It's not clear to me what you did when running the Synapse Workflow Hook (when the second container failed to access a file written by the first one, as shown here https://github.com/Sage-Bionetworks/SynapseWorkflowHook/issues/45#issuecomment-509819602).
STDERR: 2019-07-09T21:31:28.334908559Z OSError: [Errno 13] Permission denied: '/var/lib/docker/volumes/synapseworkflowhook_shared/_data/182337f7-8533-4f42-8ecc-e4a5e3a3b3cc/EHR-challenge-master/docker_agent_workflow.cwl'
Were you not using 'root' as the user in the containers?
Were you not using 'root' as the user in the containers?
I went and checked on this. When we spin up the workflow container, here is the "top" results.
UID PID PPID C STIME TTY TIME CMD
root 5166 5147 1 11:55 ? 00:00:18 /usr/local/openjdk-11/bin/java -classpath /usr/share/maven/boot/plexus-classworlds-2.6.0.jar -Dclassworlds.conf=/usr/share/maven/bin/m2.conf -Dmaven.home=/usr/share/maven -Dlibrary.jansi.path=/usr/share/maven/lib/jansi-native -Dmaven.multiModuleProjectDirectory=/ org.codehaus.plexus.classworlds.launcher.Launcher exec:java -DentryPoint=org.sagebionetworks.WorkflowHook
It seems we are running in the container as root at least here
That pid is running right now and is running on the host and is not in a container. I don't know off hand how that is possible though I can image that having access to docker permissions would given an avenue.
Ok, sorry, that wasn't the case, I just got in a bit of a hurry there.
Here's what I see right on that root process in the process tree of the host:
├─dockerd-current─┬─docker-containe─┬─docker-containe─┬─java───19*[{java}]
│ │ │ └─9*[{docker-containe}]
│ │ └─12*[{docker-containe}]
│ └─12*[{dockerd-current}]
@brucehoff I notice the Docker.toil file that at the end of the file we have the following:
WORKDIR /workdir
Since in in the docker-compose.yaml file we are setting the volumes as such:
volumes:
- shared:/shared:rw
- /var/run/docker.sock:/var/run/docker.sock
Could this be causing an issue? Should that WORKDIR be /shared?
Could this be causing an issue?
No: "The WORKDIR instruction sets the working directory for any RUN, CMD, ENTRYPOINT, COPY and ADD instructions that follow it in the Dockerfile." from: https://docs.docker.com/engine/reference/builder/#workdir
Since the WORKDIR line is the last line in Dockerfile.Toil it has no effect. We will remove it to avoid future confusion. Additionally when we run Toil (using this container image) we include the workdir
option which indeed points inside the shared volume and which overrides the WORKDIR in the Dockerfile: https://docs.docker.com/engine/reference/run/#workdir
You can verify my claim if you have a Toil container remaining on your system from a previous run (even if it's stopped) by running docker inspect
on the container and looking at the setting for the working directory.
At this point it's not clear to me what the difference is between the manual experiment showing two containers sharing a file through a volume and the Workflow Hook failing to do the same thing in your environment. Is it clear to you what the next step is or do we need to put our heads together to decide what to do next?
Yeah, lets get together and brainstorm, we're out of ideas on our end.
Note: To run the workflow hook without Docker Compose:
export DOCKER_ENGINE_URL=unix:///var/run/docker.sock
export SYNAPSE_USERNAME=xxxxx
export SYNAPSE_PASSWORD=xxxxx
export WORKFLOW_OUTPUT_ROOT_ENTITY_ID=synXXXXX
export TOIL_CLI_OPTIONS="--defaultMemory 100M --retryCount 0 --defaultDisk 1000000"
export EVALUATION_TEMPLATES={"xxxxx":"synXXXXX"}
export MAX_CONCURRENT_WORKFLOWS=2
export SUBMITTER_NOTIFICATION_MASK=28
export COMPOSE_PROJECT_NAME=workflow_orchestrator
docker volume create ${COMPOSE_PROJECT_NAME}_shared
docker pull sagebionetworks/synapseworkflowhook
docker run -v ${COMPOSE_PROJECT_NAME}_shared:/shared:rw -v /var/run/docker.sock:/var/run/docker.sock:rw \
-e DOCKER_ENGINE_URL=${DOCKER_ENGINE_URL} \
-e SYNAPSE_USERNAME=${SYNAPSE_USERNAME} \
-e SYNAPSE_PASSWORD=${SYNAPSE_PASSWORD} \
-e WORKFLOW_OUTPUT_ROOT_ENTITY_ID=${WORKFLOW_OUTPUT_ROOT_ENTITY_ID} \
-e EVALUATION_TEMPLATES=${EVALUATION_TEMPLATES} \
-e NOTIFICATION_PRINCIPAL_ID=${NOTIFICATION_PRINCIPAL_ID} \
-e SHARE_RESULTS_IMMEDIATELY=${SHARE_RESULTS_IMMEDIATELY} \
-e DATA_UNLOCK_SYNAPSE_PRINCIPAL_ID=${DATA_UNLOCK_SYNAPSE_PRINCIPAL_ID} \
-e TOIL_CLI_OPTIONS="${TOIL_CLI_OPTIONS}" \
-e MAX_CONCURRENT_WORKFLOWS=${MAX_CONCURRENT_WORKFLOWS} \
-e SUBMITTER_NOTIFICATION_MASK=${SUBMITTER_NOTIFICATION_MASK} \
-e COMPOSE_PROJECT_NAME=${COMPOSE_PROJECT_NAME} \
--privileged \
sagebionetworks/synapseworkflowhook
Running the above (and submitting a job) I am able to replicate what the UW folks encountered, as shown below. The workflow hook runs, downloads the workflow and starts the Toil container, but Toil is not able to see the workflow it needs to run:
STDERR: 2019-07-18T14:26:18.134200065Z Traceback (most recent call last):
STDERR: 2019-07-18T14:26:18.134273484Z File "/usr/local/bin/toil-cwl-runner", line 10, in <module>
STDERR: 2019-07-18T14:26:18.134286088Z sys.exit(main())
STDERR: 2019-07-18T14:26:18.134295174Z File "/usr/local/lib/python2.7/site-packages/toil/cwl/cwltoil.py", line 1200, in main
STDERR: 2019-07-18T14:26:18.135198206Z loading_context.fetcher_constructor)
STDERR: 2019-07-18T14:26:18.135248397Z File "/usr/local/lib/python2.7/site-packages/cwltool/load_tool.py", line 86, in resolve_tool_uri
STDERR: 2019-07-18T14:26:18.135769949Z uri = resolver(document_loader, argsworkflow)
STDERR: 2019-07-18T14:26:18.135818886Z File "/usr/local/lib/python2.7/site-packages/cwltool/resolver.py", line 44, in tool_resolver
STDERR: 2019-07-18T14:26:18.136356452Z ret = r(document_loader, uri)
STDERR: 2019-07-18T14:26:18.136411953Z File "/usr/local/lib/python2.7/site-packages/cwltool/resolver.py", line 21, in resolve_local
STDERR: 2019-07-18T14:26:18.136456183Z if pathobj.is_file():
STDERR: 2019-07-18T14:26:18.136474956Z File "/usr/local/lib/python2.7/site-packages/pathlib2/__init__.py", line 1575, in is_file
STDERR: 2019-07-18T14:26:18.137362524Z return S_ISREG(self.stat().st_mode)
STDERR: 2019-07-18T14:26:18.137392060Z File "/usr/local/lib/python2.7/site-packages/pathlib2/__init__.py", line 1356, in stat
STDERR: 2019-07-18T14:26:18.137570073Z return self._accessor.stat(self)
STDERR: 2019-07-18T14:26:18.137583994Z File "/usr/local/lib/python2.7/site-packages/pathlib2/__init__.py", line 541, in wrapped
STDERR: 2019-07-18T14:26:18.137684245Z return strfunc(str(pathobj), *args)
STDERR: 2019-07-18T14:26:18.137697512Z OSError: [Errno 13] Permission denied: '/var/lib/docker/volumes/workflow_orchestrator_shared/_data/215e5e3f-7768-490c-a00b-419944dfa066/SynapseWorkflowExample-master/workflow-entrypoint.cwl'
Here's an interesting finding: I am able to see the mounted file in another container: After the failure the shared volume remains and the downloaded workflow is still there. I ran a simple 'ubuntu' container, mounting the shared volume and can see the workflow. This tells me there is nothing inherent in UW's environment that precludes sharing files between containers:
docker run -it --rm -v workflow_orchestrator_shared:/shared ubuntu bash
root@d2ac7be48c26:/# cat /shared/215e5e3f-7768-490c-a00b-419944dfa066/SynapseWorkflowExample-master/workflow-entrypoint.cwl
#!/usr/bin/env cwl-runner
#
# Sample workflow
# Inputs:
# submissionId: ID of the Synapse submission to process
# adminUploadSynId: ID of a folder accessible only to the submission queue administrator
# submitterUploadSynId: ID of a folder accessible to the submitter
# workflowSynapseId: ID of the Synapse entity containing a reference to the workflow file(s)
# synapseConfig: configuration file for Synapse client, including login credentials
#
cwlVersion: v1.0
class: Workflow
...
Why can the 'ubuntu' container see the file but the Toil container cannot? To investigate, let's see how the Toil container is started up:
docker inspect --format "$(<run.tpl)" workflow_job.1b4617b6-23dc-4727-b904-dc904da47aa8
docker run \
--name=/workflow_job.1b4617b6-23dc-4727-b904-dc904da47aa8 \
--env="TMPDIR=/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" \
--env="TEMP=/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" \
--env="TMP=/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" \
--env="DOCKER_HOST=unix:///var/run/docker.sock" \
--env="PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
--env="LANG=C.UTF-8" \
--env="PYTHONIOENCODING=UTF-8" \
--env="GPG_KEY=C01E1CAD5EA2C4F0B8E3571504C367C218ADD4FF" \
--env="PYTHON_VERSION=2.7.16" \
--env="PYTHON_PIP_VERSION=19.1.1" \
--network "bridge" \
\
--volume="/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108:/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108:rw" \
--volume="/var/run/docker.sock:/var/run/docker.sock:rw" \
--log-driver="json-file" \
--log-opt max-file="2" \
--log-opt max-size="1g" \
--restart="" \
--detach=true \
"sagebionetworks/synapseworkflowhook-toil" \
"toil-cwl-runner" "--defaultMemory" "100M" "--retryCount" "0" "--defaultDisk" "1000000" "--workDir" "/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" "--noLinkImports" "SynapseWorkflowExample-master/workflow-entrypoint.cwl" "/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108/TMP2051815944166861985.yaml"
We can clean this up a lot, to leave:
docker run -it --rm \
--volume="/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108:/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108:rw" \
sagebionetworks/synapseworkflowhook-toil bash
(We add in '-it' so we can use it interactively and '--rm' to clean it up when we're done.) Result:
docker run -it --rm --volume="/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108:/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108:rw" sagebionetworks/synapseworkflowhook-toil bash
root@eeba063cb894:/# more /var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108/SynapseWorkflowExample-master/workflow-entrypoint.cwl
#!/usr/bin/env cwl-runner
#
# Sample workflow
# Inputs:
# submissionId: ID of the Synapse submission to process
# adminUploadSynId: ID of a folder accessible only to the submission queue administrator
# submitterUploadSynId: ID of a folder accessible to the submitter
# workflowSynapseId: ID of the Synapse entity containing a reference to the workflow file(s)
# synapseConfig: configuration file for Synapse client, including login credentials
#
...
Why does it work!?!?! Perhaps my clean up of the docker run
command omitted some key element. Restoring as much as possible of the original command did not change anything.
docker run \
> --name=/mytest \
> --env="TMPDIR=/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" \
> --env="TEMP=/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" \
> --env="TMP=/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" \
> --env="DOCKER_HOST=unix:///var/run/docker.sock" \
> --env="PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
> --env="LANG=C.UTF-8" \
> --env="PYTHONIOENCODING=UTF-8" \
> --env="GPG_KEY=C01E1CAD5EA2C4F0B8E3571504C367C218ADD4FF" \
> --env="PYTHON_VERSION=2.7.16" \
> --env="PYTHON_PIP_VERSION=19.1.1" \
> --network "bridge" \
> --volume="/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108:/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108:rw" \
> --volume="/var/run/docker.sock:/var/run/docker.sock:rw" \
> --log-driver="json-file" \
> --log-opt max-file="2" \
> --log-opt max-size="1g" \
> --restart="" \
> -it --rm \
> "sagebionetworks/synapseworkflowhook-toil" bash
root@1b7f68c37403:/# more /var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108/SynapseWorkflowExample-master/workflow-entrypoint.cwl
#!/usr/bin/env cwl-runner
#
# Sample workflow
# Inputs:
# submissionId: ID of the Synapse submission to process
# adminUploadSynId: ID of a folder accessible only to the submission queue administrator
# submitterUploadSynId: ID of a folder accessible to the submitter
# workflowSynapseId: ID of the Synapse entity containing a reference to the workflow file(s)
# synapseConfig: configuration file for Synapse client, including login credentials
#
cwlVersion: v1.0
...
OK, then, let's try running the workflow itself:
docker run \
> --name=/workflow_job.MANUAL \
> --env="TMPDIR=/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" \
> --env="TEMP=/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" \
> --env="TMP=/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" \
> --env="DOCKER_HOST=unix:///var/run/docker.sock" \
> --env="PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
> --env="LANG=C.UTF-8" \
> --env="PYTHONIOENCODING=UTF-8" \
> --env="GPG_KEY=C01E1CAD5EA2C4F0B8E3571504C367C218ADD4FF" \
> --env="PYTHON_VERSION=2.7.16" \
> --env="PYTHON_PIP_VERSION=19.1.1" \
> --network "bridge" \
> --volume="/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108:/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108:rw" \
> --volume="/var/run/docker.sock:/var/run/docker.sock:rw" \
> --log-driver="json-file" \
> --log-opt max-file="2" \
> --log-opt max-size="1g" \
> --restart="" \
> --detach=true \
> "sagebionetworks/synapseworkflowhook-toil" \
> "toil-cwl-runner" "--defaultMemory" "100M" "--retryCount" "0" "--defaultDisk" "1000000" "--workDir" "/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" "--noLinkImports" "SynapseWorkflowExample-master/workflow-entrypoint.cwl" "/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108/TMP2051815944166861985.yaml"
It kicks off, no problem. Let's look at the logs:
docker logs workflow_job.MANUAL
Traceback (most recent call last):
File "/usr/local/bin/toil-cwl-runner", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/toil/cwl/cwltoil.py", line 1200, in main
loading_context.fetcher_constructor)
File "/usr/local/lib/python2.7/site-packages/cwltool/load_tool.py", line 89, in resolve_tool_uri
raise ValidationException("Not found: '%s'" % argsworkflow)
schema_salad.validate.ValidationException: Not found: 'SynapseWorkflowExample-master/workflow-entrypoint.cwl'
As when run from the workflow hook it cannot find the workflow file(s). The odd thing is that the error is different: Instead of "access denied" we get 'not found'.
The host file paths are being used here where I would expect in the case of volumes to be rather named and referenced for use in a container's file system, mounted at the appropriate spot.
-Justin
On Jul 18, 2019 8:37 AM, Bruce Hoff notifications@github.com wrote:
OK, then, let's try running the workflow itself:
docker run \
--name=/workflow_job.MANUAL \ --env="TMPDIR=/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" \ --env="TEMP=/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" \ --env="TMP=/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" \ --env="DOCKER_HOST=unix:///var/run/docker.sock" \ --env="PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \ --env="LANG=C.UTF-8" \ --env="PYTHONIOENCODING=UTF-8" \ --env="GPG_KEY=C01E1CAD5EA2C4F0B8E3571504C367C218ADD4FF" \ --env="PYTHON_VERSION=2.7.16" \ --env="PYTHON_PIP_VERSION=19.1.1" \ --network "bridge" \ --volume="/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108:/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108:rw" \ --volume="/var/run/docker.sock:/var/run/docker.sock:rw" \ --log-driver="json-file" \ --log-opt max-file="2" \ --log-opt max-size="1g" \ --restart="" \ --detach=true \ "sagebionetworks/synapseworkflowhook-toil" \ "toil-cwl-runner" "--defaultMemory" "100M" "--retryCount" "0" "--defaultDisk" "1000000" "--workDir" "/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" "--noLinkImports" "SynapseWorkflowExample-master/workflow-entrypoint.cwl" "/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108/TMP2051815944166861985.yaml"
It kicks off, no problem. Let's look at the logs:
docker logs workflow_job.MANUAL
Traceback (most recent call last):
File "/usr/local/bin/toil-cwl-runner", line 10, in
As when run from the workflow hook it cannot find the workflow file(s). The odd thing is that the error is different: Instead of "access denied" we get 'not found'.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Sage-Bionetworks/SynapseWorkflowHook/issues/45?email_source=notifications&email_token=AASUMU57WFMWQYHA4U2RPM3QACE2BA5CNFSM4HYNZQQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2I4IPQ#issuecomment-512869438, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AASUMUZ7Y66ERZRSZX2KBEDQACE2BANCNFSM4HYNZQQQ.
I modified the previous command to make the path to the .cwl file absolute, not relative. The workflow appears to run:
docker run \
--name=/workflow_job.MANUAL \
> --env="TMPDIR=/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" \
> --env="TEMP=/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" \
> --env="TMP=/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" \
> --env="DOCKER_HOST=unix:///var/run/docker.sock" \
> --env="PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
> --env="LANG=C.UTF-8" \
> --env="PYTHONIOENCODING=UTF-8" \
> --env="GPG_KEY=C01E1CAD5EA2C4F0B8E3571504C367C218ADD4FF" \
> --env="PYTHON_VERSION=2.7.16" \
> --env="PYTHON_PIP_VERSION=19.1.1" \
> --network "bridge" \
> --volume="workflow_orchestrator_shared:/var/lib/docker/volumes/workflow_orchestrator_shared/_data:rw" \
> --volume="/var/run/docker.sock:/var/run/docker.sock:rw" \
> --log-driver="json-file" \
> --log-opt max-file="2" \
> --log-opt max-size="1g" \
> --restart="" \
> --detach=true \
> "sagebionetworks/synapseworkflowhook-toil" \
> "toil-cwl-runner" "--defaultMemory" "100M" "--retryCount" "0" "--defaultDisk" "1000000" "--workDir" "/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108" "--noLinkImports" "/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108/SynapseWorkflowExample-master/workflow-entrypoint.cwl" "/var/lib/docker/volumes/workflow_orchestrator_shared/_data/fd3eb6a1-395d-4815-82c2-7a8b37aff108/TMP2051815944166861985.yaml"
I can even see the result uploaded to Synapse: https://www.synapse.org/#!Synapse:syn20540092
So we are not able to replicate the problem seen with the workflow hook by running containers manually.
The host file paths are being used here where I would expect in the case of volumes to be rather named and referenced for use in a container's file system, mounted at the appropriate spot.
There's a reason for that which I am happy to explain, but I'm 99.9% sure it's irrelevant to the problem we are sleuthing.
Perhaps the issue is somehow related to the use of a relative path to the workflow entry point. I have made is absolute and will try rerunning the workflow with this change to the Hook: https://github.com/Sage-Bionetworks/SynapseWorkflowHook/commit/e96857224f41ec100d6247ce205ebb0c654a7f5a
Result: It worked!
Thanks for the sleuthing and fix @brucehoff. I will run a workflow tomorrow that will take in a docker submission to see if it works.
To summarize, after making a small change to the workflow hook and rerunning it as described above I was able to submit to Synapse and have the workflow run in a Toil container.
@thomasyu888 , @jprosser , and @trberg , as a next step would you like to try running the updated hook? Please note that in the UW environment I don't expect the Hook to be able to run workflows which themselves run containers because the Toil container would have to be run in privileged mode. If this is a requirement we can add a parameter to the Hook to run Toil in privileged mode. Let me know.
@thomasyu888 I think you will immediately hit the issue of Toil not being in privileged mode so I added the necessary parameter. Please see https://github.com/Sage-Bionetworks/SynapseWorkflowHook/commit/b291a761ed06d5bca0d14b5d23c328d7195f374f
The updated instructions for non-compose execution:
export DOCKER_ENGINE_URL=unix:///var/run/docker.sock
export SYNAPSE_USERNAME=xxxxx
export SYNAPSE_PASSWORD=xxxxx
export WORKFLOW_OUTPUT_ROOT_ENTITY_ID=synXXXXX
export TOIL_CLI_OPTIONS="--defaultMemory 100M --retryCount 0 --defaultDisk 1000000"
export EVALUATION_TEMPLATES={"xxxxx":"synXXXXX"}
export MAX_CONCURRENT_WORKFLOWS=2
export SUBMITTER_NOTIFICATION_MASK=28
export COMPOSE_PROJECT_NAME=workflow_orchestrator
export RUN_WORKFLOW_CONTAINER_IN_PRIVILEGED_MODE=true
docker volume create ${COMPOSE_PROJECT_NAME}_shared
docker pull sagebionetworks/synapseworkflowhook
docker run -v ${COMPOSE_PROJECT_NAME}_shared:/shared:rw -v /var/run/docker.sock:/var/run/docker.sock:rw \
-e DOCKER_ENGINE_URL=${DOCKER_ENGINE_URL} \
-e SYNAPSE_USERNAME=${SYNAPSE_USERNAME} \
-e SYNAPSE_PASSWORD=${SYNAPSE_PASSWORD} \
-e WORKFLOW_OUTPUT_ROOT_ENTITY_ID=${WORKFLOW_OUTPUT_ROOT_ENTITY_ID} \
-e EVALUATION_TEMPLATES=${EVALUATION_TEMPLATES} \
-e NOTIFICATION_PRINCIPAL_ID=${NOTIFICATION_PRINCIPAL_ID} \
-e SHARE_RESULTS_IMMEDIATELY=${SHARE_RESULTS_IMMEDIATELY} \
-e DATA_UNLOCK_SYNAPSE_PRINCIPAL_ID=${DATA_UNLOCK_SYNAPSE_PRINCIPAL_ID} \
-e TOIL_CLI_OPTIONS="${TOIL_CLI_OPTIONS}" \
-e MAX_CONCURRENT_WORKFLOWS=${MAX_CONCURRENT_WORKFLOWS} \
-e SUBMITTER_NOTIFICATION_MASK=${SUBMITTER_NOTIFICATION_MASK} \
-e COMPOSE_PROJECT_NAME=${COMPOSE_PROJECT_NAME} \
-e RUN_WORKFLOW_CONTAINER_IN_PRIVILEGED_MODE=${RUN_WORKFLOW_CONTAINER_IN_PRIVILEGED_MODE} \
--privileged \
sagebionetworks/synapseworkflowhook
Alright! This seems to have solved the issues, I ran the above command and am now getting issues related to the submitted python script rather than the toil workflow.
Thank you @brucehoff and @thomasyu888 for your help with this!
@brucehoff The fix you provided did not resolve the issue.
@thomasyu888, what are the symptoms?
Verified that the host has the latest images:
[bruce.hoffSAGE@con6 ~]$ docker pull sagebionetworks/synapseworkflowhook
Using default tag: latest
Trying to pull repository docker.io/sagebionetworks/synapseworkflowhook ...
latest: Pulling from docker.io/sagebionetworks/synapseworkflowhook
Digest: sha256:5485c7f30fb44d1242eec50d4a2036489c5125e9566fde42255faea2a8559efb
Status: Image is up to date for docker.io/sagebionetworks/synapseworkflowhook:latest
[bruce.hoffSAGE@con6 ~]$ docker pull sagebionetworks/synapseworkflowhook-toil
Using default tag: latest
Trying to pull repository docker.io/sagebionetworks/synapseworkflowhook-toil ...
latest: Pulling from docker.io/sagebionetworks/synapseworkflowhook-toil
Digest: sha256:8b6c0c13de69a8d599adbc7c923eb5fdda0ee914cd2911bca23b4f5f310baae4
Status: Image is up to date for docker.io/sagebionetworks/synapseworkflowhook-toil:latest
@thomasyu888 a more specific question: Do you have an example of a workflow working directory of the form:
/var/lib/docker/volumes/workflow_orchestrator_shared/_data/<uuid>/
that was created with the latest version of the workflow hook? Can we see the permissions on the folder (as well as the subfolder(s) created by Toil) to see if the 'umask' command produced the intended effect?
@brucehoff. Actually there is something I would like to try. Yesterday we worked out that providing the 'z' in the docker run volume mount allowed for bind mounts to work. So I wonder if the same z would work. To be specific:
docker run -v /path/to/volume/:/output:z ....
Please be careful with that option, it will auto-create labels and could really mess up the system. This is likely the cause of our problems before were the whole /var/run got relabeled on the host which basically trashed the host leading us to just recreate it rather than try to recover from that event.
From: Thomas Yu notifications@github.com Sent: Tuesday, July 23, 2019 8:50 AM To: Sage-Bionetworks/SynapseWorkflowHook Cc: Justin Prosser; Mention Subject: Re: [Sage-Bionetworks/SynapseWorkflowHook] Permission Error when running docker-compose (#45)
@brucehoffhttps://github.com/brucehoff. Actually there is something I would like to try. Yesterday we worked out that providing the 'z' in the docker run volume mount allowed for bind mounts to work. So I wonder if the same z would work. To be specific:
docker run -v /path/to/volume/:output:z ....
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Sage-Bionetworks/SynapseWorkflowHook/issues/45?email_source=notifications&email_token=AASUMU3SPX2FHWBWDWB3K23QA4SCXA5CNFSM4HYNZQQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2TSGNQ#issuecomment-514270006, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AASUMU6XCUUF5CKC4N6RM3LQA4SCXANCNFSM4HYNZQQQ.
Thanks @jprosser. I see this on docker site: https://docs.docker.com/storage/bind-mounts/. Should I use the z
or Z
?
If you use selinux you can add the z or Z options to modify the selinux label of the host file or directory being mounted into the container. This affects the file or directory on the host machine itself and can have consequences outside of the scope of Docker.
The z option indicates that the bind mount content is shared among multiple containers. The Z option indicates that the bind mount content is private and unshared. Use extreme caution with these options. Bind-mounting a system directory such as /home or /usr with the Z option renders your host machine inoperable and you may need to relabel the host machine files by hand.
Important: When using bind mounts with services, selinux labels (:Z and :z), as well as :ro are ignored. See moby/moby #32579 for details.
This example sets the z option to specify that multiple containers can share the bind mount’s contents:
It is not possible to modify the selinux label using the --mount flag.
$ docker run -d \
-it \
--name devtest \
-v "$(pwd)"/target:/app:z \
nginx:latest
Also @brucehoff specifying the z
or Z
with the output bind allows use to write to /output. See https://github.com/Sage-Bionetworks/ChallengeWorkflowTemplates/blob/temp/run_docker.cwl#L91-L92.
If we decide that the way we use z
or Z
is secure, you probably can revert the changes you made with umask
?
So I've gotten my debug docker submission to run all the way through the pipeline using z and z,ro to mount the volumes. However I only used these options when we mounted volumes in the training and inference script and not with the docker.sock.
The debug docker submission includes reading and writing data to volumes and includes writing a predictions.csv file to output.
One other solution we discussed for when we bind the training data is to create a docker volume with the hosted data. So:
$ ls test
wowow
docker volume create --name testing -o device=/data/users/thomas.yuSAGE/test -o o=bind
docker run -ti -v testing:/input ubuntu bash
root@a9ce47dca371:/# ls input/
wowow
An interesting discovery is after I create this volume, I also don't see experience the permission error if I mount the directory.
docker run -ti -v /data/users/thomas.yuSAGE/test:/input ubuntu bash
root@14aff9b181bc:/# ls input/
wowow
But... If i create a new directory and don't create a docker volume:
mkdir wow
touch wow/see
docker run -ti -v /data/users/thomas.yuSAGE/wow:/input ubuntu bash
root@aee71b11983e:/# ls input/
ls: cannot open directory 'input/': Permission denied
Does docker "relabel" the directory when a volume is explicitly created?
you probably can revert the changes you made with umask?
Once you have determined that everything works, let me know and we can revert the change and then test again (to make sure the reversion doesn't break anything).
I would like your (@jprosser, @brucehoff, @trberg ) opinions on the Z
and z
mount as the extent of my knowledge is what I have read. The options currently are:
z
or Z
(Not sure what is the correct one. This does indeed allow us to read and write into the /output
and read data from /train
umask
/ chmod 777
so that the mounted volumes will have the correct permissions. (Haven't gotten this working completely, but we confirmed that changing permissions on the folder itself does allow permissions for the docker container)/output
directory.)Thanks for all the sleuthing.
Since there were concerns about using z
I suggest continuing to pursue the approach of changing the sharing permissions on the mounted directory (choice 2
above). To do so, please start by answering my earlier question: https://github.com/Sage-Bionetworks/SynapseWorkflowHook/issues/45#issuecomment-514266402
Ideally volumes would be used here to keep everything within the container world, at least anything the containers touch. If there's a need to cross that container/host barrier, then permissions should be intentionally managed. If not, we see things like auto-labeling and permissions of 777 which drop that barrier in the most open possible way. I'd guess Docker bind mounts are probably the best way to punch through the container/host barrier but you're still going to need permission management in the end.
-Justin
From: Thomas Yu notifications@github.com Sent: Tuesday, July 23, 2019 11:18 AM To: Sage-Bionetworks/SynapseWorkflowHook Cc: Justin Prosser; Mention Subject: Re: [Sage-Bionetworks/SynapseWorkflowHook] Permission Error when running docker-compose (#45)
I would like your (@jprosserhttps://github.com/jprosser, @brucehoffhttps://github.com/brucehoff, @trberghttps://github.com/trberg ) opinions on the Z and z mount as the extent of my knowledge is what I have read. The options currently are:
Thanks for all the sleuthing.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Sage-Bionetworks/SynapseWorkflowHook/issues/45?email_source=notifications&email_token=AASUMU6LFM2QLRKJJM3NRLDQA5DOPA5CNFSM4HYNZQQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2T7UFA#issuecomment-514325012, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AASUMUYFOTLGWIFZZNRHOQDQA5DOPANCNFSM4HYNZQQQ.
@thomasyu888 a more specific question: Do you have an example of a workflow working directory of the form:
/var/lib/docker/volumes/workflow_orchestrator_shared/_data/<uuid>/
that was created with the latest version of the workflow hook? Can we see the permissions on the folder (as well as the subfolder(s) created by Toil) to see if the 'umask' command produced the intended effect?
In our non-root user scenario, this is not an accessible location by any user login. A user with Docker permission can certainly affect this location, but not access directly.
So we are running into an issue where the command "docker-compose --verbose up" runs into a permissions issue, even when running as sudo:
We find we can bypass this error by running the docker-compose in a privileged state. However, we then run into an other permission error further down the CWL pipeline when trying to pull in docker containers.
We are using Redhat (which doesn't support docker-compose) for our OS and are running docker version 1.13.1.
Our reference evaluation pipeline is located here: https://github.com/Sage-Bionetworks/EHR-challenge and is correctly being pulled into the running pipeline.
We had this pipeline up and running at one point but had to restart the VM and now it's broken. The restart updated the OS and docker version but didn't radically change anything.
Any insight would be helpful to troubleshoot this issue.
Thank you