broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1k stars 360 forks source link

run docker with --user $EUID by default #2658

Open Redmar-van-den-Berg opened 7 years ago

Redmar-van-den-Berg commented 7 years ago

This issue was prompted by the following thread on the forum: https://gatkforums.broadinstitute.org/wdl/discussion/10347/localization-via-hard-link-has-failed

By default, cromwell does not specify a user when running docker, which leads to two issues:

Firstly, it will run each tasks as whatever user the docker image specifies, typically 'root' as this is the default. This means that depending on how the docker image was build, a completely random user is suddenly the owner of your output files. For example, if the user inside the docker container has UID 1042, this could map to a completely unrelated user on the host system, who is suddenly the owner of your output files.

Related to this issue is the fact that all output files have to be world-readable by default, or else the cromwell process will not have read access to the files it has just created (which are now owned by UID 1042). This is not desireable when cromwell is run on a system with many users, some of which should not have read access to eachothers data (for example when working with patient data).

As a solution, I propose to change the default docker invocation to run the analysis as $EUID by default. This works even when the EUID is not mapped to a valid user within the docker image, and ensures that the cromwell user is the owner of all the files generated by docker.

From man bash: "EUID Expands to the effective user ID of the current user, initialized at shell startup. This variable is readonly." , so it should be available on every system.

This solution makes cromwell act more secure by default, and will also solve the issue of copying over data files as discussed on the forum and in #2620

kshakir commented 7 years ago

At this time, changing the-user-id-running-inside-the-docker-container from root/0 to the-user-id-running-outside-the-container would be a breaking change. There are probably hundreds of docker images running on Cromwell, and some unknown fraction of them may be expecting to run as docker's default root. Without more discussion with our wider user base we cannot make this change to the default user id just yet.

However, as a workaround for those that would like to harden their environments, one can change the default docker user from root to $EUID by changing their config:

backend.providers.Local.config {
  runtime-attributes = """
  String? docker
  String docker_user = "$EUID"
  """
}

(tested on 29-242b111)

Redmar-van-den-Berg commented 7 years ago

I have changed it in my local config, thanks. I should add that this appears to work even for images that do not have user $EUID configured inside the docker, but there could be images that expect the user to be root to function.

geoffjentry commented 7 years ago

Thanks redmar. I'll admit that this is coming from a stance of abundance of caution, but we're intrigued by the idea.

Let's leave this open a bit, I'll try to investigate some more over the weekend

geoffjentry commented 7 years ago

Hi @Redmar-van-den-Berg - I'm falling back to the "abundance of caution" stance but am still intrigued. I'm going to close the PR for now but leave this issue open. For posterity's sake it's PR #2659 ... we can open it back up once we verify what impact this does(n't) have.

rhpvorderman commented 5 years ago

For people who also spent some time with this there is also an option to do this on a per workflow basis:

  1. Put it in a options.json:
    {
    "default_runtime_attributes": {
    "docker_user": "$EUID"
    }
    }
  2. Do it on the command line: java -Dbackend.providers.Local.config.runtime-attributes='String? docker String? docker_user="$EUID"' -jar <cromwell_jar> run mypipeline.wdl

@geoffjentry Do you have some preference where we should put this in the documentation? Or not, of course. I posted it here so people searching for a solution would find it.