Closed Bregor closed 7 years ago
Thanks for the contribution! Please ensure your commits follow our style guide. This code will be tested once a Deis maintainer reviews it.
Jenkins, OK to test
@Bregor thanks for this feature addition!
Before I dive into testing on an RBAC-only cluster, I wanted make sure installing charts on non-RBAC clusters (as CI does) still works. As it stands, installation fails either b/c the necessary api version is missing (from ci cluster, looks like v1.5.3
):
Error: apiVersion "rbac.authorization.k8s.io/v1alpha1" in workflow/charts/controller/templates/controller-clusterrolebinding.yaml is not available
or the cluster otherwise isn't RBAC-ready (the following from my v1.6.0
mk cluster):
$ helm install --wait workflow-pr/workflow --namespace=deis --version v2.14.1-20170508161641-sha.3d6bbf9 --set fluentd.docker_tag=git-740bd61 --name deis-workflow
Error: release deis-workflow failed: clusterroles.rbac.authorization.k8s.io "deis:deis-logger-fluentd" is forbidden: attempt to grant extra privileges: [{[list] [] [pods] [] []} {[get] [] [pods] [] []} {[watch] [] [pods] [] []}] user=&{system:serviceaccount:kube-system:default e9349b4a-311c-11e7-bc10-7665e4a656ee [system:serviceaccounts system:serviceaccounts:kube-system system:authenticated] map[]} ownerrules=[{[create] [authorization.k8s.io] [selfsubjectaccessreviews] [] []} {[get] [] [] [] [/api /api/* /apis /apis/* /healthz /swaggerapi /swaggerapi/* /version]}] ruleResolutionErrors=[]
Therefore, I think the first priority is to wrap the template generation in a feature flag. Perhaps global.use_rbac
or similar, defaulting to false
(or commented out). When false/not set, the templates added here should not be generated. When true, they would be.
How does that sound?
Jenkins, add to whitelist
Looks like we need specific delete
permission on a namespace
as well. At least, when I do deis destroy
, I get an error when the controller tries to delete the namespace in an RBAC environment.
172.17.0.17 "POST /v2/apps/ HTTP/1.1" 201 170 "Deis Client dev-209c985"
INFO [aerial-yodeling]: deleting environment
ERROR:root:Uncaught Exception
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 480, in dispatch
response = handler(request, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 93, in destroy
self.perform_destroy(instance)
File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 97, in perform_destroy
instance.delete()
File "/app/api/models/app.py", line 260, in delete
self._scheduler.ns.delete(self.id)
File "/app/scheduler/resources/namespace.py", line 52, in delete
raise KubeHTTPException(response, 'delete Namespace "{}"', namespace)
File "/app/scheduler/exceptions.py", line 10, in __init__
data = response.json()
File "/usr/local/lib/python3.5/dist-packages/requests/models.py", line 866, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
172.17.0.17 "DELETE /v2/apps/aerial-yodeling/ HTTP/1.1" 500 25 "Deis Client dev-209c985"
@mboersma added and rebased
:exclamation: No coverage uploaded for pull request base (
master@4b015cd
). Click here to learn what that means. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #1292 +/- ##
=========================================
Coverage ? 87.04%
=========================================
Files ? 45
Lines ? 3929
Branches ? 681
=========================================
Hits ? 3420
Misses ? 338
Partials ? 171
@mboersma, quick note about namespace deletion:
RBAC absolutely lacks ownership
you can grant to specific user creation and deletion of namespaces, BUT! you can’t list particular namespaces which he can delete.
So this user will have permission to delete any namespace in cluster
The only way to “hack” it is to create role
(just role, not clusterrole) in every namespace belongs to user with ability to namespace deletion.
And then gather all this roles in clusterrolebinding
for this user.
For this reason I intentionally disabled namespace deletion from the beginning.
I hope we will find a way to create proper workaround for this "issue" at last :)
For this reason I intentionally disabled namespace deletion from the beginning.
Ah, ok--that makes a lot more sense after your explanation. That's too bad the namespace-delete perm is global. It was a shortcut to have the controller just delete the entire app namespace in the first place.
It would probably be a smarter implementation to delete just the objects that controller actually created, and to print a warning if that all succeeds but the namespace deletion fails. But for now this makes RBAC workable for Workflow.
@Bregor I'll test more tomorrow. I heard you're vacationing for a week or so--enjoy! We will keep testing this and let you know if we find other issues. There should be plenty of time to get this merged for the next release. Thanks again!
EDIT: Fixed my case after adding the verb "get" to "pods" resource at deis-controller RBAC ClusterRole.
The error code was 403 at URL /api/v1/namespaces/some-namespace/pods/some-pod-13gfabcfg2
I applied the RBAC rules, and all seems be ok, less one thing. Was trying to update a release, doing a second "git push deis some-app", and no matter if it fails or if succeeds, it always prints the next message in the controller logs:
INFO [some-app]: waited 30s and 1 pods are in service
INFO:scheduler:[some-app]: waited 30s and 1 pods are in service
ERROR:root:Expecting value: line 1 column 1 (char 0)
Traceback (most recent call last):
File "/app/api/models/build.py", line 65, in create
self.app.deploy(new_release)
File "/app/api/models/app.py", line 572, in deploy
release.cleanup_old()
File "/app/api/models/release.py", line 307, in cleanup_old
self._scheduler.pod.delete(self.app.id, pod['metadata']['name'])
File "/app/scheduler/resources/pod.py", line 350, in delete
pod = self.pod.get(namespace, name).json()
File "/app/scheduler/resources/pod.py", line 32, in get
raise KubeHTTPException(response, message, *args)
File "/app/scheduler/exceptions.py", line 10, in __init__
data = response.json()
File "/usr/local/lib/python3.5/dist-packages/requests/models.py", line 866, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 480, in dispatch
response = handler(request, *args, **kwargs)
File "/app/api/views.py", line 527, in create
super(BuildHookViewSet, self).create(request, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
self.perform_create(serializer)
File "/app/api/viewsets.py", line 21, in perform_create
self.post_save(obj)
File "/app/api/views.py", line 533, in post_save
build.create(self.user)
File "/app/api/models/build.py", line 79, in create
raise DeisException(str(e)) from e
api.exceptions.DeisException: Expecting value: line 1 column 1 (char 0)
Sometimes does not delete the original pod and creates an extra one, indicating 2 pods running when only 1 is required by the deployment. The output of git push deis some-app
after a good deployment without pod failures:
remote: 2017/05/18 10:44:33 Error running git receive hook [The controller returned an error when publishing the release: Unknown Error (400): {"detail":"Expecting value: line 1 column 1 (char 0)"}]
To ssh://git@deis-builder.10.108.103.232.nip.io:2222/some-app.git
! [remote rejected] some-app -> some-app (pre-receive hook declined)
error: failed to push some refs to 'ssh://git@deis-builder.10.108.103.232.nip.io:2222/some-app.git'
So, it tags a good release to a "failed" one: "release h12f1ff1df23 which failed"
My RBAC controller rules look like this, maybe something is missing:
rules:
- apiGroups:
- ""
resources:
- namespaces
verbs:
- get
- list
- create
- delete
- apiGroups:
- ""
resources:
- services
verbs:
- get
- create
- update
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- apiGroups:
- ""
resources:
- events
verbs:
- list
- apiGroups:
- ""
resources:
- secrets
verbs:
- list
- get
- create
- update
- apiGroups:
- ""
resources:
- replicationcontrollers
verbs:
- list
- get
- apiGroups:
- ""
resources:
- pods/log
verbs:
- get
- apiGroups:
- ""
resources:
- pods
verbs:
- list
- delete
- apiGroups:
- extensions
resources:
- replicasets
verbs:
- list
- delete
- update
- apiGroups:
- extensions
- apps
resources:
- deployments
verbs:
- get
- list
- create
- update
- delete
- apiGroups:
- extensions
resources:
- deployments/scale
- replicasets/scale
verbs:
- get
- update
- apiGroups:
- extensions
resources:
- ingresses
verbs:
- get
- list
- watch
- create
- update
- delete
@kartojal you mean deis-controller needs ability to get pods in all namespaces (clusterrole), not only in workflow's parent namespace?
And second question: have you access to apiserver's audit.log
? If yes, could you please grep system:serviceaccount:deis:deis-controller /path/to/audit.log
while trying to update a release? This way we can catch abilities lacks in role/clusterrole. Unfortunately I can't do it myself till May 23, because I haven't access to my clusters while on vacation.
Thanks in advance!
Could check that tomorrow morning again, but yes, allowing the controller to read pods in another namespaces fixed my problem and the 403 (Unauthorized) error at updating a release. Seems like when you update a release, the controller needs to get the old release pod information, to check if the old pod have been deleted after the deployment of the new release, if i'm not wrong. Anyway tomorrow will debug again and report my findings.
Have fun!
El 19 may. 2017 10:20 a. m., "Maxim Filatov" notifications@github.com escribió:
@kartojal https://github.com/kartojal you mean deis-controller needs ability to get pods in all namespaces (clusterrole), not only in workflow's parent namespace? And second question: have you access to apiserver's audit.log? If yes, could you please grep system:serviceaccount:deis:deis-controller /path/to/audit.log while trying to update a release? This way we can catch abilities lacks in role/clusterrole. Unfortunately I can't do it myself till May 23, because I haven't access to my clusters while on vacation. Thanks in advance!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deis/controller/pull/1292#issuecomment-302640587, or mute the thread https://github.com/notifications/unsubscribe-auth/AKqXR88CU_KUwC9vzQqEyP4WpZWTkYdVks5r7VDKgaJpZM4NTHOn .
@Bregor I believe I've reproduced the same issue mentioned by @kartojal above. grepped /audit.log
shows:
$ grep system:serviceaccount:deis:deis-controller /audit.log | grep pods
2017-05-25T00:09:34.153574773Z AUDIT: id="a5a3be49-c646-4652-b498-59c897c5afc4" ip="172.17.0.13" method="GET" user="system:serviceaccount:deis:deis-controller" groups="\"system:serviceaccounts\",\"system:serviceaccounts:deis\",\"system:authenticated\"" as="<self>" asgroups="<lookup>" namespace="mk-edp" uri="/api/v1/namespaces/mk-edp/pods?labelSelector=heritage%3Ddeis"
which correlates to the fix mentioned in https://github.com/deis/controller/pull/1292#issuecomment-302389070, namely adding the get
verb for the pods
resource. Shall we add this in?
edit: here's the full audit.log particular to the deis-controller
serviceaccount, if helps: https://gist.github.com/vdice/5648dc621a7594273db7b5b167a534f2
Sure, thanks for testing, guys! I'll rebase and push ASAP
@vdice, @mboersma I read carefully all the code in https://github.com/deis/controller/tree/master/rootfs/scheduler/resources and updated clusterrole
accordingly.
Things look good after an additional round of testing, with specific focus on items relating to updated permissions in latest commits:
deis autoscale
)deis limits
)Planning on making sure others see the same and hopefully merging this along w/ the rest of the RBAC PRs on Tuesday (US holiday here on Monday).
With this change deis-controller became available to work in RBAC-only clusters
Works with both Kubernetes 1.5 and 1.6 (see templates/_helpers.tmpl for details) Actually tested with 1.5.7 and 1.6.4
ClusterRole allows deis-controller:
get
,list
,create
and deleteget
,list
,create
,update
anddelete
get
andlist
list
andcreate
list
,get
,create
,update
anddelete
get
,list
,create
,update
anddelete
get
andupdate
get
get
,list
anddelete
get
andcreate
get
,list
,create
,update
anddelete
get
,list
,create
,update
anddelete
get
,list
,create
,update
anddelete
get
andupdate
get
,list
,delete
andupdate
get
andupdate
get
,list
,create
,update
anddelete
get
,list
,watch
,create
,update
anddelete