broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
987 stars 357 forks source link

Cromwell Service Account Roles #4304

Open multimeric opened 5 years ago

multimeric commented 5 years ago

The documentation has a nice section on how to use Service Accounts with Cromwell, with the Google Cloud backend. However, what it doesn't do is explain the roles/permissions that such an account needs. It would be appreciated if we had a list of permissions we could apply to our Service Accounts to know that we had the absolute minimum required for Cromwell to control jobs (probably separate lists for filesystem access and job management).

Currently, the roles I've applied to my Service Account are:

This works, but I know that these roles are quite permissive. Ideally I'd be able to lock it down to permissions that stop it from deleting buckets etc.

geoffjentry commented 5 years ago

Hi @TMiguelT - We should document this better but the short answer is you should go to this GCP quickstart. In particular this link sets permissions for a project (not a specific service account)

multimeric commented 5 years ago

Doesn't that link just enable the Genomics API? I'm not talking about that, I'm talking about giving permissions to a service account that is possibly running outside of Google Cloud, and so needs specific permissions.

dinvlad commented 5 years ago

I think the minimum requirements are: 1) Project-wide Genomics Pipelines Runner and Compute Instance Admin (v1) roles for the service account used by Cromwell itself. 2) Service Account User permission on Compute Engine default service account for the Cromwell service account. 3) Storage Object Admin permission on the Cromwell execution and data buckets, as well as Storage Object Viewer on the GCR bucket (if used).

freeseek commented 4 years ago

When using the PAPIv2 backend, I have noticed that the same previous set of roles is not sufficient to be able to run the pipelines. Instead, after a long and tedious amount of work, I have figured that the following set of roles: 1) Cloud Life Sciences Workflows Runner (lifesciences.workflowsRunner) 2) Service Account User (iam.serviceAccountUser) 3) Firebase Develop Admin (firebase.developAdmin)

are sufficient to run a pipelne on Google Cloud through a service account. I suppose that lifesciences.workflowsRunner is a replacement for genomics.pipelinesRunner, but I have no idea why firebase.developAdmin is required (or what else should be required in its place). To save my life, I could not find this information anywhere in the Cromwell documentation nor evince it from the Cromwell error messages themselves (nor understand what the firebase.developAdmin roles actually allows).

dinvlad commented 4 years ago

@freeseek firebase.developAdmin is a pretty wide role (with 204 permissions), so it's not surprising that it gives some permissions that are needed here. What would be helpful is if Google showed the exact permissions in their error messages, though from what it seems, that's not always the case. Then if you have a list of permissions, you can find minimal role(s) that encompass those permissions, rather than through a blind hunt (please correct me if it wasn't entirely blind here..)

Btw @freeseek, from my limited experience, GitHub issues here are not often-looked-through, it might be better to create an internal JIRA ticket instead ;)

freeseek commented 4 years ago

@dinvlad it was indeed a blind hunt! So in that sense, 204 permissions is not that much ... it's a pretty refined subset ;-) Previously I was running Cromwell with the editor role set, which likely has even more than 204 permissions. Without the firebase.developAdmin role, the only error I get is that the tasks start running, then they fail immediately, and the only thing you find in the logs is: yyyy/mm/dd hh:mm:ss Starting container setup.

In any case, I wanted to give an answer here to provide publicly available information to other users.

dinvlad commented 4 years ago

To make it less of a blind hunt, it's also possible to look into your Stackdriver Audit logs - they should list all GCP API calls in your project that failed with 403. This way you can get a better sense of which ones Cromwell is actually using. I've been meaning to write a tool to simplify this kind of analysis, but you can do it with the logs even now.

freeseek commented 4 years ago

So here is a final update. I have tried running Cromwell with the following roles:

  1. Cloud Life Sciences Workflows Runner (lifesciences.workflowsRunner)
  2. Service Account User (iam.serviceAccountUser)
  3. Storage Object Creator (roles/storage.objectCreator)
  4. Storage Object Viewer (roles/storage.objectViewer)

And I have got the following error from Cromwell:

java.lang.Exception: Task xxx.xxxNA:1 failed. Job exited without an error, exit code 0. PAPI error code 9. Please check the log file for more details: xxx

And the log just contains this cryptic message:

yyyy/mm/dd hh:mm:ss Starting container setup.

I have then tried to run Cromwell with the following roles:

  1. Cloud Life Sciences Workflows Runner (lifesciences.workflowsRunner)
  2. Service Account User (iam.serviceAccountUser)
  3. Storage Object Admin (storage.objectAdmin)

And the workflow succeeded. To give a full explanation of the set of roles and permissions needed, I wrote a little python script roles.py that collects this information from Google:

#!/bin/python3
import subprocess
import requests
import pandas as pd
import sys

token = subprocess.check_output(["gcloud","auth","print-access-token"]).decode("utf8").strip()
response = requests.get("https://iam.googleapis.com/v1/roles", headers={"accept": "application/json", "Authorization": "Bearer "+token}, params={"pageSize": 1000, "view": "FULL"})
roles_json = response.json()['roles']
roles = [role['name'] for role in roles_json if 'includedPermissions' in role for permission in role['includedPermissions']]
permissions = [permission for role in roles_json if 'includedPermissions' in role for permission in role['includedPermissions']]

df = pd.DataFrame(dict(roles=roles, permissions=permissions))
df.to_csv(sys.stdout, sep = '\t', header = False, index = False)

When running this script, I get:

$ ./roles.py | grep "lifesciences.workflowsRunner\|iam.serviceAccountUser\|storage.objectAdmin\|storage.objectCreator\|storage.objectViewer" | column -t
roles/iam.serviceAccountUser        iam.serviceAccounts.actAs
roles/iam.serviceAccountUser        iam.serviceAccounts.get
roles/iam.serviceAccountUser        iam.serviceAccounts.list
roles/iam.serviceAccountUser        resourcemanager.projects.get
roles/iam.serviceAccountUser        resourcemanager.projects.list
roles/lifesciences.workflowsRunner  lifesciences.operations.cancel
roles/lifesciences.workflowsRunner  lifesciences.operations.get
roles/lifesciences.workflowsRunner  lifesciences.operations.list
roles/lifesciences.workflowsRunner  lifesciences.workflows.run
roles/storage.objectAdmin           resourcemanager.projects.get
roles/storage.objectAdmin           resourcemanager.projects.list
roles/storage.objectAdmin           storage.objects.create
roles/storage.objectAdmin           storage.objects.delete
roles/storage.objectAdmin           storage.objects.get
roles/storage.objectAdmin           storage.objects.getIamPolicy
roles/storage.objectAdmin           storage.objects.list
roles/storage.objectAdmin           storage.objects.setIamPolicy
roles/storage.objectAdmin           storage.objects.update
roles/storage.objectCreator         resourcemanager.projects.get
roles/storage.objectCreator         resourcemanager.projects.list
roles/storage.objectCreator         storage.objects.create
roles/storage.objectViewer          resourcemanager.projects.get
roles/storage.objectViewer          resourcemanager.projects.list
roles/storage.objectViewer          storage.objects.get
roles/storage.objectViewer          storage.objects.list

Somehow the tutorial suggests to add roles storage.objectCreator and storage.objectViewer but these do not include one of the four permissions storage.objects.delete, storage.objects.getIamPolicy, storage.objects.setIamPolicy, or storage.objects.update that are further added when adding also role storage.objectAdmin and at least one of these must be further needed by Cromwell.

Either than by trial and error, I still do not understand how users are supposed to understand this.

dinvlad commented 4 years ago

Nice work, could you also try that Stackdriver suggestion? It should be pretty easy to compose a query that will give you all the permissions required, based on 403. And then you can enter those permissions in the roles tab in Cloud Console, and it will give you the matching role(s). As an ultimate solution, you can create a custom role with those permissions only, so that it follows the least privileges principle.

I'd be curious what you find - please post back here if you do ;)

freeseek commented 4 years ago

I am curious ... how do I look into my Stackdriver Audit logs?

dinvlad commented 4 years ago

Please take a look here https://cloud.google.com/logging/docs/audit

freeseek commented 4 years ago

Not sure what I should be doing. I have tried the following command:

gcloud logging read 'timestamp>="2020-09-01T00:00:00Z"' > logs

And then:

$ cat logs | grep 30148356615-compute@developer.gserviceaccount.com -A10 | grep -i permission | cut -d: -f2 | sort | uniq -c
     14  lifesciences.operations.cancel
    425  lifesciences.workflows.run
     12  storage.buckets.get
  30629  storage.objects.create
  30985  storage.objects.delete
  12819  storage.objects.get
    157  storage.objects.getIamPolicy
   6859  storage.objects.list

It does seem to be the case that storage.objects.delete is requested many times, so that is definitely an issue when you only have roles storage.objectCreator and storage.objectViewer but not storage.objectAdmin. I did not observe any permission from role iam.serviceAccountUser but that role is indeed needed. And I observe some requests for permission storage.buckets.get that do end in ERROR, but it does not seem to affect the pipeline.

dinvlad commented 4 years ago

Typically, I do that through the Cloud Logging Console, instead of fetching the entire log (which could be huge, and expensive) ;) There, you can set up filters to narrow down on particular log entries.

iam.serviceAccountUser is mostly about granting one iam.serviceAccounts.actAs permission on a service account. Not sure why it doesn't show up here, but this permission is required for the Cromwell server to be able to run a pipeline with a Compute SA.

BTW iam.serviceAccountUser should be granted on a per-service-account level, not at the project level (not sure if you've set it up this way, just wanted to confirm). First make sure you don't have that permission granted at the project level, and then if you remove it from the service-account level, it should be able to be seen in the logs.

freeseek commented 4 years ago

Hmmm, I don't even know how I would grant it at the project level. I pretty much used this:

for role in lifesciences.workflowsRunner iam.serviceAccountUser storage.objectAdmin; do
  gcloud projects add-iam-policy-binding MY-GOOGLE-PROJECT --member serviceAccount:MY-NUMBER-compute@developer.gserviceaccount.com --role roles/$role
done

Maybe if iam.serviceAccounts.actAs is granted only once I might have missed it as I was not able to download the whole log file. Do you know why occasionally storage.buckets.get is requested and what actually happens to Cromwell if it is not granted to the service account?

dinvlad commented 4 years ago

Yes, that's granting it at the project level (gcloud projects add-iam-policy-binding). Granting at the SA level would probably be in this case

gcloud iam service-accounts add-iam-policy-binding \
  serviceAccount:MY-NUMBER-compute@developer.gserviceaccount.com \
  --member serviceAccount:MY-NUMBER-compute@developer.gserviceaccount.com \
  --role roles/iam.serviceAccountUser

Notice that here we grant MY-NUMBER-compute SA iam.serviceAccountUser role on itself! This is probably not the best practice, as you should use a separate SA for Cromwell VM from the one that is used by Cromwell jobs. Still, this is better than granting it at the project level, as otherwise any machine started with the default MY-NUMBER-compute SA can act as any other SA in that project. Additionally, it's not good to use the default SA at all, ideally you should create a dedicated SA for Cromwell itself and also another dedicated SA for the Cromwell jobs.

That being said, if you're running this in an isolated project that doesn't have any access to anything else, this may be fine. But that's why it takes quite a bit of effort/know-how to set up Cromwell properly. I agree this is not an easy task, and should be documented a bit more ;)

freeseek commented 4 years ago

Wait, this seems very interesting, but I think you have lost me here. When I setup my Cromwell server, I have used the following google stanza:

google {
  application-name = "cromwell"
  auths = [
    {
      name = "service-account"
      scheme = "service_account"
      json-file = "MY-GOOGLE-PROJECT-############.json"
    }
  ]
}

What does it mean to "use a separate SA for Cromwell VM"? The way I run the Cromwell server is I login to my Google VM with:

gcloud compute ssh INSTANCE-ID -- -L 8000:localhost:8000

And then I run:

(java -Xmx3500m -Dconfig.file=PAPIv2.conf -jar cromwell-XY.jar server &)
freeseek commented 4 years ago

I did another run last night, and I have found a few entries like this including iam.serviceAccounts.* permissions:

insertId: 1mk6qq6ej6zkd
logName: projects/mccarroll-mocha/logs/cloudaudit.googleapis.com%2Fdata_access
protoPayload:
  '@type': type.googleapis.com/google.cloud.audit.AuditLog
  authenticationInfo:
    principalEmail: giulio@broadinstitute.org
    principalSubject: user:giulio@broadinstitute.org
  authorizationInfo:
  - granted: true
    permission: iam.serviceAccounts.list
    resource: projects/mccarroll-mocha
    resourceAttributes: {}
  methodName: google.iam.admin.v1.ListServiceAccounts
  request:
    '@type': type.googleapis.com/google.iam.admin.v1.ListServiceAccountsRequest
    name: projects/mccarroll-mocha
    page_size: 100
  requestMetadata:
    callerIp: 64.112.179.105
    callerSuppliedUserAgent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101
      Firefox/80.0,gzip(gfe)
    destinationAttributes: {}
    requestAttributes:
      auth: {}
      time: '2020-09-03T03:28:37.843325531Z'
  resourceName: projects/mccarroll-mocha
  serviceName: iam.googleapis.com
  status: {}
receiveTimestamp: '2020-09-03T03:28:38.742413691Z'
resource:
  labels:
    location: global
    method: google.iam.admin.v1.ListServiceAccounts
    project_id: mccarroll-mocha
    service: iam.googleapis.com
    version: v1
  type: api
severity: INFO
timestamp: '2020-09-03T03:28:37.734190692Z'

Sometimes like this instead:

insertId: 1mk6qq6ek68fs
logName: projects/mccarroll-mocha/logs/cloudaudit.googleapis.com%2Fdata_access
protoPayload:
  '@type': type.googleapis.com/google.cloud.audit.AuditLog
  authenticationInfo:
    principalEmail: google@broadinstitute.com
    principalSubject: user:google@broadinstitute.com
  authorizationInfo:
  - granted: true
    permission: iam.serviceAccounts.list
    resource: projects/mccarroll-mocha
    resourceAttributes: {}
  methodName: google.iam.admin.v1.ListServiceAccounts
  request:
    '@type': type.googleapis.com/google.iam.admin.v1.ListServiceAccountsRequest
    name: projects/mccarroll-mocha
  requestMetadata:
    callerIp: 69.173.70.180
    callerSuppliedUserAgent: (gzip),gzip(gfe)
    destinationAttributes: {}
    requestAttributes:
      auth: {}
      time: '2020-09-03T11:58:49.543410910Z'
  resourceName: projects/mccarroll-mocha
  serviceName: iam.googleapis.com
  status: {}
receiveTimestamp: '2020-09-03T11:58:49.691467944Z'
resource:
  labels:
    location: global
    method: google.iam.admin.v1.ListServiceAccounts
    project_id: mccarroll-mocha
    service: iam.googleapis.com
    version: v1
  type: api
severity: INFO
timestamp: '2020-09-03T11:58:49.452628092Z'

The principalEmail sometimes is giulio@broadinstitute.org and sometimes is google@broadinstitute.com so I am not sure what these requests are for.

dinvlad commented 4 years ago

Those requests are probably a red herring, but I suggest reaching out to us (DSP AppSec) on Slack for those ;)

Re the dedicated SA, there're a couple issues with your config:

1) We typically don't recommend downloading a SA key to a GCP VM, since all GCP VMs normally have a SA associated with them (when you start them). Cromwell will just pick them up automatically "from the environment". So please don't download a SA key to it and instead use this as the recommended option, per Cromwell docs:

    {
      name = "application-default"
      scheme = "application_default"
    },

I can provide more details from the config I used previously, if this doesn't work.

2) Which SA is MY-GOOGLE-PROJECT-############.json for? From your earlier gcloud projects add-iam-policy-binding command, it seems like that was for MY-NUMBER-compute@developer.gserviceaccount.com, which is the so-called "Default Compute Service Account" in your project. Using it is not recommended, since it has pretty wide permissions from the get-go. So I'd recommend creating a separate SA and granting it those roles instead, and then assigning that SA to the Cromwell VM before you start it.

dinvlad commented 4 years ago

Also, I think it might be better use of our time to have a meeting together to help you set up everything properly, and then you could suggest a summary improvement to the docs in a PR here ;)