Open multimeric opened 5 years ago
Hi @TMiguelT - We should document this better but the short answer is you should go to this GCP quickstart. In particular this link sets permissions for a project (not a specific service account)
Doesn't that link just enable the Genomics API? I'm not talking about that, I'm talking about giving permissions to a service account that is possibly running outside of Google Cloud, and so needs specific permissions.
I think the minimum requirements are:
1) Project-wide Genomics Pipelines Runner
and Compute Instance Admin (v1)
roles for the service account used by Cromwell itself.
2) Service Account User
permission on Compute Engine default service account
for the Cromwell service account.
3) Storage Object Admin
permission on the Cromwell execution and data buckets, as well as Storage Object Viewer
on the GCR bucket (if used).
When using the PAPIv2
backend, I have noticed that the same previous set of roles is not sufficient to be able to run the pipelines. Instead, after a long and tedious amount of work, I have figured that the following set of roles:
1) Cloud Life Sciences Workflows Runner (lifesciences.workflowsRunner
)
2) Service Account User (iam.serviceAccountUser
)
3) Firebase Develop Admin (firebase.developAdmin
)
are sufficient to run a pipelne on Google Cloud through a service account. I suppose that lifesciences.workflowsRunner
is a replacement for genomics.pipelinesRunner
, but I have no idea why firebase.developAdmin
is required (or what else should be required in its place). To save my life, I could not find this information anywhere in the Cromwell documentation nor evince it from the Cromwell error messages themselves (nor understand what the firebase.developAdmin
roles actually allows).
@freeseek firebase.developAdmin
is a pretty wide role (with 204 permissions), so it's not surprising that it gives some permissions that are needed here. What would be helpful is if Google showed the exact permissions in their error messages, though from what it seems, that's not always the case. Then if you have a list of permissions, you can find minimal role(s) that encompass those permissions, rather than through a blind hunt (please correct me if it wasn't entirely blind here..)
Btw @freeseek, from my limited experience, GitHub issues here are not often-looked-through, it might be better to create an internal JIRA ticket instead ;)
@dinvlad it was indeed a blind hunt! So in that sense, 204 permissions is not that much ... it's a pretty refined subset ;-) Previously I was running Cromwell with the editor
role set, which likely has even more than 204 permissions. Without the firebase.developAdmin
role, the only error I get is that the tasks start running, then they fail immediately, and the only thing you find in the logs is: yyyy/mm/dd hh:mm:ss Starting container setup.
In any case, I wanted to give an answer here to provide publicly available information to other users.
To make it less of a blind hunt, it's also possible to look into your Stackdriver Audit logs - they should list all GCP API calls in your project that failed with 403. This way you can get a better sense of which ones Cromwell is actually using. I've been meaning to write a tool to simplify this kind of analysis, but you can do it with the logs even now.
So here is a final update. I have tried running Cromwell with the following roles:
And I have got the following error from Cromwell:
java.lang.Exception: Task xxx.xxxNA:1 failed. Job exited without an error, exit code 0. PAPI error code 9. Please check the log file for more details: xxx
And the log just contains this cryptic message:
yyyy/mm/dd hh:mm:ss Starting container setup.
I have then tried to run Cromwell with the following roles:
And the workflow succeeded. To give a full explanation of the set of roles and permissions needed, I wrote a little python script roles.py
that collects this information from Google:
#!/bin/python3
import subprocess
import requests
import pandas as pd
import sys
token = subprocess.check_output(["gcloud","auth","print-access-token"]).decode("utf8").strip()
response = requests.get("https://iam.googleapis.com/v1/roles", headers={"accept": "application/json", "Authorization": "Bearer "+token}, params={"pageSize": 1000, "view": "FULL"})
roles_json = response.json()['roles']
roles = [role['name'] for role in roles_json if 'includedPermissions' in role for permission in role['includedPermissions']]
permissions = [permission for role in roles_json if 'includedPermissions' in role for permission in role['includedPermissions']]
df = pd.DataFrame(dict(roles=roles, permissions=permissions))
df.to_csv(sys.stdout, sep = '\t', header = False, index = False)
When running this script, I get:
$ ./roles.py | grep "lifesciences.workflowsRunner\|iam.serviceAccountUser\|storage.objectAdmin\|storage.objectCreator\|storage.objectViewer" | column -t
roles/iam.serviceAccountUser iam.serviceAccounts.actAs
roles/iam.serviceAccountUser iam.serviceAccounts.get
roles/iam.serviceAccountUser iam.serviceAccounts.list
roles/iam.serviceAccountUser resourcemanager.projects.get
roles/iam.serviceAccountUser resourcemanager.projects.list
roles/lifesciences.workflowsRunner lifesciences.operations.cancel
roles/lifesciences.workflowsRunner lifesciences.operations.get
roles/lifesciences.workflowsRunner lifesciences.operations.list
roles/lifesciences.workflowsRunner lifesciences.workflows.run
roles/storage.objectAdmin resourcemanager.projects.get
roles/storage.objectAdmin resourcemanager.projects.list
roles/storage.objectAdmin storage.objects.create
roles/storage.objectAdmin storage.objects.delete
roles/storage.objectAdmin storage.objects.get
roles/storage.objectAdmin storage.objects.getIamPolicy
roles/storage.objectAdmin storage.objects.list
roles/storage.objectAdmin storage.objects.setIamPolicy
roles/storage.objectAdmin storage.objects.update
roles/storage.objectCreator resourcemanager.projects.get
roles/storage.objectCreator resourcemanager.projects.list
roles/storage.objectCreator storage.objects.create
roles/storage.objectViewer resourcemanager.projects.get
roles/storage.objectViewer resourcemanager.projects.list
roles/storage.objectViewer storage.objects.get
roles/storage.objectViewer storage.objects.list
Somehow the tutorial suggests to add roles storage.objectCreator
and storage.objectViewer
but these do not include one of the four permissions storage.objects.delete
, storage.objects.getIamPolicy
, storage.objects.setIamPolicy
, or storage.objects.update
that are further added when adding also role storage.objectAdmin
and at least one of these must be further needed by Cromwell.
Either than by trial and error, I still do not understand how users are supposed to understand this.
Nice work, could you also try that Stackdriver suggestion? It should be pretty easy to compose a query that will give you all the permissions required, based on 403. And then you can enter those permissions in the roles tab in Cloud Console, and it will give you the matching role(s). As an ultimate solution, you can create a custom role with those permissions only, so that it follows the least privileges principle.
I'd be curious what you find - please post back here if you do ;)
I am curious ... how do I look into my Stackdriver Audit logs?
Please take a look here https://cloud.google.com/logging/docs/audit
Not sure what I should be doing. I have tried the following command:
gcloud logging read 'timestamp>="2020-09-01T00:00:00Z"' > logs
And then:
$ cat logs | grep 30148356615-compute@developer.gserviceaccount.com -A10 | grep -i permission | cut -d: -f2 | sort | uniq -c
14 lifesciences.operations.cancel
425 lifesciences.workflows.run
12 storage.buckets.get
30629 storage.objects.create
30985 storage.objects.delete
12819 storage.objects.get
157 storage.objects.getIamPolicy
6859 storage.objects.list
It does seem to be the case that storage.objects.delete
is requested many times, so that is definitely an issue when you only have roles storage.objectCreator
and storage.objectViewer
but not storage.objectAdmin
. I did not observe any permission from role iam.serviceAccountUser
but that role is indeed needed. And I observe some requests for permission storage.buckets.get
that do end in ERROR, but it does not seem to affect the pipeline.
Typically, I do that through the Cloud Logging Console, instead of fetching the entire log (which could be huge, and expensive) ;) There, you can set up filters to narrow down on particular log entries.
iam.serviceAccountUser
is mostly about granting one iam.serviceAccounts.actAs
permission on a service account. Not sure why it doesn't show up here, but this permission is required for the Cromwell server to be able to run a pipeline with a Compute SA.
BTW iam.serviceAccountUser
should be granted on a per-service-account level, not at the project level (not sure if you've set it up this way, just wanted to confirm). First make sure you don't have that permission granted at the project level, and then if you remove it from the service-account level, it should be able to be seen in the logs.
Hmmm, I don't even know how I would grant it at the project level. I pretty much used this:
for role in lifesciences.workflowsRunner iam.serviceAccountUser storage.objectAdmin; do
gcloud projects add-iam-policy-binding MY-GOOGLE-PROJECT --member serviceAccount:MY-NUMBER-compute@developer.gserviceaccount.com --role roles/$role
done
Maybe if iam.serviceAccounts.actAs
is granted only once I might have missed it as I was not able to download the whole log file. Do you know why occasionally storage.buckets.get
is requested and what actually happens to Cromwell if it is not granted to the service account?
Yes, that's granting it at the project level (gcloud projects add-iam-policy-binding
).
Granting at the SA level would probably be in this case
gcloud iam service-accounts add-iam-policy-binding \
serviceAccount:MY-NUMBER-compute@developer.gserviceaccount.com \
--member serviceAccount:MY-NUMBER-compute@developer.gserviceaccount.com \
--role roles/iam.serviceAccountUser
Notice that here we grant MY-NUMBER-compute
SA iam.serviceAccountUser
role on itself! This is probably not the best practice, as you should use a separate SA for Cromwell VM from the one that is used by Cromwell jobs.
Still, this is better than granting it at the project level, as otherwise any machine started with the default MY-NUMBER-compute
SA can act as any other SA in that project. Additionally, it's not good to use the default SA at all, ideally you should create a dedicated SA for Cromwell itself and also another dedicated SA for the Cromwell jobs.
That being said, if you're running this in an isolated project that doesn't have any access to anything else, this may be fine. But that's why it takes quite a bit of effort/know-how to set up Cromwell properly. I agree this is not an easy task, and should be documented a bit more ;)
Wait, this seems very interesting, but I think you have lost me here. When I setup my Cromwell server, I have used the following google stanza:
google {
application-name = "cromwell"
auths = [
{
name = "service-account"
scheme = "service_account"
json-file = "MY-GOOGLE-PROJECT-############.json"
}
]
}
What does it mean to "use a separate SA for Cromwell VM"? The way I run the Cromwell server is I login to my Google VM with:
gcloud compute ssh INSTANCE-ID -- -L 8000:localhost:8000
And then I run:
(java -Xmx3500m -Dconfig.file=PAPIv2.conf -jar cromwell-XY.jar server &)
I did another run last night, and I have found a few entries like this including iam.serviceAccounts.*
permissions:
insertId: 1mk6qq6ej6zkd
logName: projects/mccarroll-mocha/logs/cloudaudit.googleapis.com%2Fdata_access
protoPayload:
'@type': type.googleapis.com/google.cloud.audit.AuditLog
authenticationInfo:
principalEmail: giulio@broadinstitute.org
principalSubject: user:giulio@broadinstitute.org
authorizationInfo:
- granted: true
permission: iam.serviceAccounts.list
resource: projects/mccarroll-mocha
resourceAttributes: {}
methodName: google.iam.admin.v1.ListServiceAccounts
request:
'@type': type.googleapis.com/google.iam.admin.v1.ListServiceAccountsRequest
name: projects/mccarroll-mocha
page_size: 100
requestMetadata:
callerIp: 64.112.179.105
callerSuppliedUserAgent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101
Firefox/80.0,gzip(gfe)
destinationAttributes: {}
requestAttributes:
auth: {}
time: '2020-09-03T03:28:37.843325531Z'
resourceName: projects/mccarroll-mocha
serviceName: iam.googleapis.com
status: {}
receiveTimestamp: '2020-09-03T03:28:38.742413691Z'
resource:
labels:
location: global
method: google.iam.admin.v1.ListServiceAccounts
project_id: mccarroll-mocha
service: iam.googleapis.com
version: v1
type: api
severity: INFO
timestamp: '2020-09-03T03:28:37.734190692Z'
Sometimes like this instead:
insertId: 1mk6qq6ek68fs
logName: projects/mccarroll-mocha/logs/cloudaudit.googleapis.com%2Fdata_access
protoPayload:
'@type': type.googleapis.com/google.cloud.audit.AuditLog
authenticationInfo:
principalEmail: google@broadinstitute.com
principalSubject: user:google@broadinstitute.com
authorizationInfo:
- granted: true
permission: iam.serviceAccounts.list
resource: projects/mccarroll-mocha
resourceAttributes: {}
methodName: google.iam.admin.v1.ListServiceAccounts
request:
'@type': type.googleapis.com/google.iam.admin.v1.ListServiceAccountsRequest
name: projects/mccarroll-mocha
requestMetadata:
callerIp: 69.173.70.180
callerSuppliedUserAgent: (gzip),gzip(gfe)
destinationAttributes: {}
requestAttributes:
auth: {}
time: '2020-09-03T11:58:49.543410910Z'
resourceName: projects/mccarroll-mocha
serviceName: iam.googleapis.com
status: {}
receiveTimestamp: '2020-09-03T11:58:49.691467944Z'
resource:
labels:
location: global
method: google.iam.admin.v1.ListServiceAccounts
project_id: mccarroll-mocha
service: iam.googleapis.com
version: v1
type: api
severity: INFO
timestamp: '2020-09-03T11:58:49.452628092Z'
The principalEmail sometimes is giulio@broadinstitute.org
and sometimes is google@broadinstitute.com
so I am not sure what these requests are for.
Those requests are probably a red herring, but I suggest reaching out to us (DSP AppSec) on Slack for those ;)
Re the dedicated SA, there're a couple issues with your config:
1) We typically don't recommend downloading a SA key to a GCP VM, since all GCP VMs normally have a SA associated with them (when you start them). Cromwell will just pick them up automatically "from the environment". So please don't download a SA key to it and instead use this as the recommended option, per Cromwell docs:
{
name = "application-default"
scheme = "application_default"
},
I can provide more details from the config I used previously, if this doesn't work.
2) Which SA is MY-GOOGLE-PROJECT-############.json
for? From your earlier gcloud projects add-iam-policy-binding
command, it seems like that was for MY-NUMBER-compute@developer.gserviceaccount.com
, which is the so-called "Default Compute Service Account" in your project. Using it is not recommended, since it has pretty wide permissions from the get-go. So I'd recommend creating a separate SA and granting it those roles instead, and then assigning that SA to the Cromwell VM before you start it.
Also, I think it might be better use of our time to have a meeting together to help you set up everything properly, and then you could suggest a summary improvement to the docs in a PR here ;)
The documentation has a nice section on how to use Service Accounts with Cromwell, with the Google Cloud backend. However, what it doesn't do is explain the roles/permissions that such an account needs. It would be appreciated if we had a list of permissions we could apply to our Service Accounts to know that we had the absolute minimum required for Cromwell to control jobs (probably separate lists for filesystem access and job management).
Currently, the roles I've applied to my Service Account are:
This works, but I know that these roles are quite permissive. Ideally I'd be able to lock it down to permissions that stop it from deleting buckets etc.