Open trberg opened 5 years ago
Hmmm. I seem to be incorrect in my comment... Apologies! However, as z
does "relabelling", I'm now unsure if the z
option had an impact on the /var/lib/docker/volumes/workflow_orchestrator_shared/_data/
directory which allows for a normal :rw
mount now. As the documentation said, using z
does in fact change things on the host machine. Prior to using z, I was getting the same error (Permission denied).
/run.sh: 2: /run.sh: cannot create /output/predictions.csv: Permission denied
@brucehoff I unfortunately do not have a directory I could give you. It is also entirely possible that umask
fixed the issue...
So @jprosser. Based on your comment. you are suggesting #3 in my comment https://github.com/Sage-Bionetworks/SynapseWorkflowHook/issues/45#issuecomment-514325012. which does indeed work for the input training data.
However, I'm not sure how we would create the docker volume for the /output on the fly (The issue here is that CWL needs to be able to link to the specified output). So the options for that are umask
or Z
option.
@thomasyu888 yep, I would prefer that.
So in our UW systems (rhel based), we have a label of "container_file_t" which can allow a container processes to write to that location (as policy exists for such an activity on that label). Users are basically unaffected by this in our environment (though could also be constrained, we don't go that far) but are still subjected to unix permissions of course.
So if a container knew the uid of the user controlling all this, user root in that container could change the owner of some file/dir to that external uid and then the user on the outside would own that file/dir. Since we operate within /data generally with users and data, we could do this to, say /data/common, and create a way for a non-root host users to interact with containers that use root that would otherwise generate uid=0 only files and directories (a 777 mode would create the same but this really is a bad idea, especially with that execute bit set).
@thomasyu888, you say:
It is also entirely possible that umask fixed the issue...
Can you explain? Earlier you said that the umask approach failed to fix the issue. Do you have evidence that the changed allowed things to work?
@brucehoff . Let me walk you to what i did
umask
approach, so I ran the workflow hook and received a permission denied issue. Its possible that I wasn't using the newest version or something else happened....z
in mounting the /output
as well (since it worked with mounting the /train
data), so I did that after Step 1 failed and I found that the workflowhook ran and there were no longer permission issuesz
option to test if the umask
approach does work, BUT... I read that z
actually changes the host, so I wonder if step 2 actually changed something that allows the hook to run with or without the umask
approach. Does that make sense? One way to test it is to remove the umask
approach to see if the hook will run into permission issues.
I read that z actually changes the host
Fascinating: I find that I am now unable to reproduce the original error. Could it be that you have modified the host somehow? Here's what I now see:
# create a volume
[bruce.hoffSAGE@con6 ~]$ docker volume create hoff_test1
hoff_test1
# mount the volume to a container running in privileged mode and create a subfolder
[bruce.hoffSAGE@con6 ~]$ docker run -it --rm -v hoff_test1:/test --privileged ubuntu bash
root@09b7da83d9ca:/# mkdir /test/privileged_subdir
root@09b7da83d9ca:/# exit
exit
#mount the volume to a container NOT running in privileged mode and write to the subfolder
[bruce.hoffSAGE@con6 ~]$ docker run -it --rm -v hoff_test1:/test ubuntu bash
root@9ee8462bec15:/# touch /test/privileged_subdir/somefile.txt
root@9ee8462bec15:/# ls -l /test/privileged_subdir
total 0
-rw-r--r--. 1 root root 0 Jul 23 20:38 somefile.txt
root@9ee8462bec15:/# exit
Could you revert the change you made to the host so we are back in the original situation?
One way to test it is to remove the umask approach to see if the hook will run into permission issues.
The test above does that: When I create the folder it has the normal permissions, not '777':
[bruce.hoffSAGE@con6 ~]$ docker run -it --rm -v hoff_test1:/test --privileged ubuntu bash
root@86572f347d21:/# cd /test
root@86572f347d21:/test# ls -al
total 8
drwxr-xr-x. 3 root root 4096 Jul 23 20:38 .
drwxr-xr-x. 22 root root 254 Jul 23 20:42 ..
drwxr-xr-x. 2 root root 4096 Jul 23 20:38 privileged_subdir
As you can see it has 755 permissions.
@brucehoff The changes that Docker makes with SELinux are not persistent, so if the file system is relabeled for whatever reason, you will be back to the original state. Let me know if you'd like that done.
@thomasyu888 Please see @jprosser 's offer, above. I would like to restore the host to its original state. Do you agree?
I agree. Please restore back to original state.
I stopped dockerd, relabeled the home directories, then started dockerd just now. I didn't touch the root file system (so /var/lib/docker/volumes) in this relabeling process. Let me know if you'd also like to reset /var/lib/docker/volumes which by default is no write (I don't off hand know if docker will recover from that but it certainly should).
I still cannot repro' the original problem:
[bruce.hoffSAGE@con6 ~]$ docker volume rm hoff_test1
hoff_test1
[bruce.hoffSAGE@con6 ~]$ docker volume create hoff_test1
hoff_test1
[bruce.hoffSAGE@con6 ~]$ docker run -it --rm -v hoff_test1:/test --privileged ubuntu bash
root@45e6e1b286aa:/# mkdir /test/privileged_subdir
root@45e6e1b286aa:/# exit
exit
[bruce.hoffSAGE@con6 ~]$ docker run -it --rm -v hoff_test1:/test ubuntu bash
root@7caf15f364bf:/# touch /test/privileged_subdir/somefile.txt
root@7caf15f364bf:/# ls -l /test/privileged_subdir
total 0
-rw-r--r--. 1 root root 0 Jul 23 21:27 somefile.txt
root@7caf15f364bf:/# exit
I didn't touch the root file system (so /var/lib/docker/volumes)
Maybe that's why I see no change.
Let me know if you'd also like to reset /var/lib/docker/volumes which by default is no write
I don't understand. What does "no write" mean in the context of a collection of writeable volumes?
Now im getting errors again
# No docker volume
[thomas.yuSAGE@con6 ~]$ docker run -ti -v /data/users/thomas.yuSAGE/temp/:/train ubuntu bash
root@966ad861a057:/# ls train/
ls: cannot open directory 'train/': Permission denied
root@966ad861a057:/# exit
# create docker volume
[thomas.yuSAGE@con6 ~]$ docker volume create --name tom_testing -o device=/data/users/thomas.yuSAGE/temp -o o=bind
tom_testing
[thomas.yuSAGE@con6 ~]$ docker volume inspect tom_testing
[
{
"Driver": "local",
"Labels": {},
"Mountpoint": "/var/lib/docker/volumes/tom_testing/_data",
"Name": "tom_testing",
"Options": {
"device": "/data/users/thomas.yuSAGE/temp",
"o": "bind"
},
"Scope": "local"
}
]
# run same volume above with volume
[thomas.yuSAGE@con6 ~]$ docker run -ti -v tom_testing:/train ubuntu bash
root@966ad861a057:/# ls train/
ls: cannot open directory 'train/': Permission denied
root@966ad861a057:/# exit
Looks like the z
and Z
option really did make a big difference
EDIT: The volume i created here already existed before, which caused it to not work.
So I know @jprosser isn't a huge fan of the z
option, but if we only apply them to the data folders as z,ro
and to the /model
folder, the /scratch
folder, and /output
folder as z
, will that cause a lot of problems? Previously, we had applied z
to /var/run and that really messed us up. But we're currently getting around that with privileged
. So can we proceed with the z
flag very carefully?
My understanding is that it not only changes the directory we mount, but possibly the directories in which the mounted directory lives. Which explains the ability for me to do what i did here: https://github.com/Sage-Bionetworks/SynapseWorkflowHook/issues/45#issuecomment-514395466 without getting an error in the end.
We're all set, I believe, with volumes as @thomasyu888 has recently found, as they create the right kinds of permissions automatically though user root within the container will create root owned files in the user's home dir (or shared area) which they can't readily access.
@brucehoff to your question on /var/lib/docker/volumes, this has a generic label that prevents container writes. As dockerd doesn't make permanent changes to SELinux policy, a relabeling here would mean dockerd on start would need to fix labels as appropriate (adding container_file_t back) to enable containers to write to these locations (which it probably would do).
That z,Z option is scary since it knocks out what the system is doing, and so if someone wanted to break the host they could just do /var:/var:z and wedge it up. Not entirely different than as root doing something like rm -rf /. Both are a pretty good denial of service.
@thomasyu888 It looks to me that when you create a volume, that does labeling in a one time fashion such that if we reset the labels as we did, dockerd doesn't come back around to reset as it did on creation.
Right. Thanks @jprosser .
We have resolved the first issue of binding the training data. Workflow here:
# Create temp directory and temp files
[thomas.yuSAGE@con6 ~]$ mkdir temp
[thomas.yuSAGE@con6 ~]$ touch temp/foo temp/roo
# Show volumes
[thomas.yuSAGE@con6 ~]$ docker volume ls
DRIVER VOLUME NAME
local hoff_test1
local workflow_orchestrator_shared
[thomas.yuSAGE@con6 ~]$ ls temp/
foo roo
# Create new volume mounting device
[thomas.yuSAGE@con6 ~]$ docker volume create --name tom_testing -o device=/data/users/thomas.yuSAGE/temp -o o=bind
tom_testing
[thomas.yuSAGE@con6 ~]$ docker inspect tom_testing
[
{
"Driver": "local",
"Labels": {},
"Mountpoint": "/var/lib/docker/volumes/tom_testing/_data",
"Name": "tom_testing",
"Options": {
"device": "/data/users/thomas.yuSAGE/temp",
"o": "bind"
},
"Scope": "local"
}
]
# Use volume name
[thomas.yuSAGE@con6 ~]$ docker run -ti -v tom_testing:/train ubuntu bash
root@ed289c26565e:/# ls train/
foo roo
root@ed289c26565e:/# exit
exit
So I've tried to replicate this on the challenge server. When I create a volume and mount a data folder that has not previously had the z
flag, I still run into permission errors when I later use that volume in the pipeline.
Did I miss a step?
docker volume create --name uw_train -o device=/data/common/dream/data/UW_OMOP/train -o o=bind
[[trberg@con4 dream]$ docker volume inspect uw_train
[
{
"Driver": "local",
"Labels": {},
"Mountpoint": "/var/lib/docker/volumes/uw_train/_data",
"Name": "uw_train",
"Options": {
"device": "/data/common/dream/data/UW_OMOP/train",
"o": "bind"
},
"Scope": "local"
}
]
Then later in the run_training_docker.cwl
input_dir="uw_train"
mounted_volumes = {scratch_dir:'/scratch:z',
input_dir:'/train:ro',
model_dir:'/model:z'}
The resulting error is such:
all files in /train
Traceback (most recent call last):
File "/app/train.py", line 22, in <module>
for i in os.listdir("/train"):
PermissionError: [Errno 13] Permission denied: '/train'
However, running the ubuntu test seems to work:
[trberg@con4 dream]$ docker run -it --rm -v uw_train:/data:ro ubuntu bash
root@e26acaeeb1b8:/# ls data
condition_occurrence.csv death.csv drug_exposure.csv person.csv visit_occurrence.csv
root@e26acaeeb1b8:/# touch data/death.csv
touch: cannot touch 'data/death.csv': Read-only file system
Did you update your inputdir to be uw_train
? Hmm.... I wouldn't think that Ubuntu has anything to do with it.
Yep, I replaced the absolute path
So its very strange... When i do:
docker run -ti -v tom_testing:/train docker.synapse.org/syn18405992/debug:v1 bash
root@c0ab9356ed2b:/app# bash /app/train.sh
current working directory: /app
all files in /app
train.py
infer.py
infer.sh
train.sh
/train exists: True
/train/visit_occurrence.csv exists: False
/train and file permission mask: 775
all files in /train
roo
foo
/model exists: False
/scratch exists: False
It might be possible that that something is happening when the submission is run with the docker socket that is mounted into the toil container.
Yeah, I get the same
[trberg@con4 dream]$ docker run -ti -v uw_train:/train docker.synapse.org/syn18405992/debug:v1 bash
root@9a04ae8947ce:/app# ls
infer.py infer.sh train.py train.sh
root@9a04ae8947ce:/app# bash train.sh
current working directory: /app
all files in /app
train.py
infer.py
infer.sh
train.sh
/train exists: True
/train/visit_occurrence.csv exists: True
/train and file permission mask: 775
all files in /train
visit_occurrence.csv
person.csv
drug_exposure.csv
condition_occurrence.csv
death.csv
/model exists: False
/scratch exists: False
@trberg I see you had the following with the :z
option:
mounted_volumes = {scratch_dir:'/scratch:z',
input_dir:'/train:ro',
model_dir:'/model:z'}
But looking at the labeling on those files and directories, I see the label container_var_lib_t
which is what Docker has for non-container accessed locations, whereas normally that'd be container_file_t
for container accessible files so it seems the :z
prevented container access.
Indeed, after now firing up a container to just look at it -v workflow_orchestrator_shared:/data
, now the labeling is container_file_t
so this is a bit of a moving target.
So we are running into an issue where the command "docker-compose --verbose up" runs into a permissions issue, even when running as sudo:
We find we can bypass this error by running the docker-compose in a privileged state. However, we then run into an other permission error further down the CWL pipeline when trying to pull in docker containers.
We are using Redhat (which doesn't support docker-compose) for our OS and are running docker version 1.13.1.
Our reference evaluation pipeline is located here: https://github.com/Sage-Bionetworks/EHR-challenge and is correctly being pulled into the running pipeline.
We had this pipeline up and running at one point but had to restart the VM and now it's broken. The restart updated the OS and docker version but didn't radically change anything.
Any insight would be helpful to troubleshoot this issue.
Thank you