Closed zeeZ closed 5 years ago
Curses, I did not intend this to be the case with #1442, though I admit I wasn't very diligent about trying out this scenario.
Where exactly does it come to a halt, when it's not given a ClusterRole? (what do the logs say?)
https://github.com/weaveworks/flux/issues/1830 , which should fix this, is complete but pending review
Hey, thanks for the responses.
Where exactly does it come to a halt, when it's not given a ClusterRole? (what do the logs say?)
Without ClusterRole:
ts=2019-03-14T13:39:48.868422318Z caller=main.go:165 version=1.11.0
ERROR: logging before flag.Parse: E0314 13:39:49.929945 8 reflector.go:205] github.com/weaveworks/flux/cluster/kubernetes/cached_disco.go:100: Failed to list *v1beta1.CustomResourceDefinition: customresourcedefinitions.apiextensions.k8s.io is forbidden: User "system:serviceaccount:flux:flux" cannot list resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope
ts=2019-03-14T13:39:49.947370986Z caller=main.go:295 component=cluster identity=/etc/fluxd/ssh/identity
ts=2019-03-14T13:39:49.947449236Z caller=main.go:296 component=cluster identity.pub="ssh-rsa ..."
ts=2019-03-14T13:39:49.947527827Z caller=main.go:297 component=cluster host=https://10.3.0.1:443 version=kubernetes-v1.12.5
ts=2019-03-14T13:39:49.947616546Z caller=main.go:309 component=cluster kubectl=/usr/local/bin/kubectl
ts=2019-03-14T13:39:49.949160458Z caller=main.go:319 component=cluster ping=true
ERROR: logging before flag.Parse: E0314 13:39:50.932939 8 reflector.go:205] github.com/weaveworks/flux/cluster/kubernetes/cached_disco.go:100: Failed to list *v1beta1.CustomResourceDefinition: customresourcedefinitions.apiextensions.k8s.io is forbidden: User "system:serviceaccount:flux:flux" cannot list resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope
The last line is spammed forever after.
After adding the first set of permissions, updated the repo and tried to fluxctl sync
:
ts=2019-03-14T13:44:07.898249713Z caller=checkpoint.go:24 component=checkpoint msg="up to date" latest=1.11.0
ts=2019-03-14T13:44:31.133643198Z caller=loop.go:103 component=sync-loop event=refreshed url=... branch=... HEAD=beb4159a14847c5d0b0e5d4cbeccb7f3d4da2766
ts=2019-03-14T13:44:31.247826109Z caller=loop.go:210 component=sync-loop err="collating resources in cluster for sync: componentstatuses is forbidden: User \"system:serviceaccount:flux:flux\" cannot list resource \"componentstatuses\" in API group \"\" at the cluster scope"
ts=2019-03-14T13:44:31.250451239Z caller=loop.go:90 component=sync-loop err="collating resources in cluster for sync: componentstatuses is forbidden: User \"system:serviceaccount:flux:flux\" cannot list resource \"componentstatuses\" in API group \"\" at the cluster scope"
ts=2019-03-14T13:45:08.121177099Z caller=warming.go:268 component=warmer info="refreshing image" image=... tag_count=207 to_update=1 of_which_refresh=1 of_which_missing=0
ts=2019-03-14T13:45:08.139291505Z caller=warming.go:364 component=warmer updated=... successful=1 attempted=1
ts=2019-03-14T13:49:07.446622744Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2019-03-14T13:49:38.983850606Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://git@....git branch=... HEAD=beb4159a14847c5d0b0e5d4cbeccb7f3d4da2766
ts=2019-03-14T13:54:07.629381015Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2019-03-14T13:54:44.119740704Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://git@....git branch=... HEAD=84970b52031752ec2790c20802a0f2419f6b4c84
ts=2019-03-14T13:54:44.336051836Z caller=loop.go:210 component=sync-loop err="collating resources in cluster for sync: configmaps is forbidden: User \"system:serviceaccount:flux:flux\" cannot list resource \"configmaps\" in API group \"\" at the cluster scope"
ts=2019-03-14T13:54:44.338921916Z caller=loop.go:90 component=sync-loop err="collating resources in cluster for sync: configmaps is forbidden: User \"system:serviceaccount:flux:flux\" cannot list resource \"configmaps\" in API group \"\" at the cluster scope"
ts=2019-03-14T13:59:07.767724146Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2019-03-14T13:59:49.26397648Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://git@....git branch=... HEAD=84970b52031752ec2790c20802a0f2419f6b4c84
ts=2019-03-14T14:04:07.889994656Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2019-03-14T14:04:56.89208238Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://git@....git branch=... HEAD=84970b52031752ec2790c20802a0f2419f6b4c84
ts=2019-03-14T14:05:23.734827732Z caller=loop.go:111 component=sync-loop jobID=1d217122-5fbe-df8e-976f-05db5f03a6f0 state=in-progress
ts=2019-03-14T14:05:31.362681374Z caller=loop.go:123 component=sync-loop jobID=1d217122-5fbe-df8e-976f-05db5f03a6f0 state=done success=true
ts=2019-03-14T14:05:36.499539849Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://git@....git branch=... HEAD=84970b52031752ec2790c20802a0f2419f6b4c84
ts=2019-03-14T14:09:08.028520016Z caller=images.go:17 component=sync-loop msg="polling images"
ts=2019-03-14T14:10:04.550939503Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://git@....git branch=... HEAD=84970b52031752ec2790c20802a0f2419f6b4c84
Always the following after a restart with the tag behind head
, with varying resources.
caller=loop.go:210 component=sync-loop err="collating resources in cluster for sync: configmaps is forbidden: User \"system:serviceaccount:flux:flux\" cannot list resource \"configmaps\" in API group \"\" at the cluster scope"
caller=loop.go:90 component=sync-loop err="collating resources in cluster for sync: configmaps is forbidden: User \"system:serviceaccount:flux:flux\" cannot list resource \"configmaps\" in API group \"\" at the cluster scope"
Repo tag never moved and nothing was applied. I added that resource, killed the pod and repeated until I added the *
to the role. No errors after and it applied and moved the tag.
1830 , which should fix this, is complete but pending review
Brill, thanks for that @zeeZ, most helpful!
You might have to stick to v1.10.1 for now @zeeZ -- sorry about that :-/
1668 I assume?
Yeah, sorry
Now I am thinking that #1668 by itself won't be enough since it doesn't prevent flux from attempting to list cluster-scoped resources.
We need to think about this.
@zeeZ The fix will be included in the next Fix release. For now, you can test whether your issue is definitely fixed by using image quay.io/weaveworks/flux:master-5f0e9292
.
Please reopen this issue if it isn't fixed.
@2opremio I actually checked out your branch earlier. With no config change from 1.10.1 to yours sync worked as expected, thank you.
What remains is the following, but didn't have any impact for me as there are no CRDs managed by flux:
ERROR: logging before flag.Parse: E0315 11:00:55.601512 9 reflector.go:205] github.com/weaveworks/flux/cluster/kubernetes/cached_disco.go:100: Failed to list *v1beta1.CustomResourceDefinition: customresourcedefinitions.apiextensions.k8s.io is forbidden: User "system:serviceaccount:flux:flux" cannot list resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope
This is repeated every second
Fantastic! I will look into fixing that as well
@zeeZ Are you getting any other errors? (even if not repeated)
No further errors after adding a watch/list CRD cluster role.
Great, I will try to get a fix for that early next week
I've created a sample repo of some of the things I did to lock down Flux, maybe it can be of some use: https://github.com/zeeZ/locked-down-flux
I believe that's as far as I can go without Helm or GC enabled. Removing any of the rules defined will produce some kind of error during common operations, though I haven't played around with it enough to be able to tell where sync is actually affected and what is just noise.
I've taken a look at the remaining recurring error. It's a tricky one because the client-go
library swallows it and handles it internally (logging by default):
func (r *Reflector) Run(stopCh <-chan struct{}) {
glog.V(3).Infof("Starting reflector %v (%s) from %s", r.expectedType, r.resyncPeriod, r.name)
wait.Until(func() {
if err := r.ListAndWatch(stopCh); err != nil {
utilruntime.HandleError(err)
}
}, r.period, stopCh)
}
I see a bunch of options:
runtime.ErrorHandlers
to mute Forbidden/NotExist errors (probably a bad idea) or to do some smart error handling (probably another bad idea).I dealt with a similar problem in Scope before, going for (2) but the error handling wasn't so deep down in the call stack.
@squaremo / @hiddeco thoughts?
2. Create and maintain our own implementation of the controller/reflector (which sounds awful)
Yes; adapting parts of client-go is usually a quixotic enterprise. If it's much more complicated than the solution in weaveworks/scope, I'd say it's not worth it.
Can we mute glog by doing flag.Parse with some fake command-line options? I'm grasping at straws .. (it's probably better to do 3. instead)
I went for (3) in the end
@zeeZ It should be fixed now. I would appreciate if you could give it a try ( quay.io/weaveworks/flux:master-2d4cc4d
)
After removing the CRD role I still get a constant stream of
ts=2019-03-18T21:05:54.062786645Z caller=main.go:175 type="internal kubernetes error" err="github.com/weaveworks/flux/cluster/kubernetes/cached_disco.go:100: Failed to list *v1beta1.CustomResourceDefinition: customresourcedefinitions.apiextensions.k8s.io is forbidden: User \"system:serviceaccount:flux-system:flux\" cannot list resource \"customresourcedefinitions\" in API group \"apiextensions.k8s.io\" at the cluster scope"
I did some digging around the IsForbidden || IsNotFound workaround you added, but it seems ReasonForError
returns StatusReasonUnknown
. I'm not familiar with K8S source, but I believe what we're dealing with here is no metav1 error but a more generic one: https://github.com/kubernetes/client-go/blob/7d04d0e2a0a1a4d4a1cd6baa432a2301492e4e65/tools/cache/reflector.go#L251
While it stings a bit, I can live with allowing CRD listing. My initial issue was with list access to everything in the cluster, which has been resolved thanks to you.
Perhaps documentation could be added with the minimum privileges Flux needs in order to operate properly, though I suspect that be complicated with helm and GC. Maybe a more restricted minimal example next to deploy?
On a positive note, at least it is not silently firing a request every second that may add up for each instance you run ;)
Crap, sorry about that. I need to do some further thinking.
On Mon, Mar 18, 2019, 22:42 Christian notifications@github.com wrote:
After removing the CRD role I still get a constant stream of
ts=2019-03-18T21:05:54.062786645Z caller=main.go:175 type="internal kubernetes error" err="github.com/weaveworks/flux/cluster/kubernetes/cached_disco.go:100: Failed to list *v1beta1.CustomResourceDefinition: customresourcedefinitions.apiextensions.k8s.io is forbidden: User \"system:serviceaccount:flux-system:flux\" cannot list resource \"customresourcedefinitions\" in API group \"apiextensions.k8s.io\" at the cluster scope"
I did some digging around the IsForbidden || IsNotFound workaround you added, but it seems ReasonForError returns StatusReasonUnknown. I'm not familiar with K8S source, but I believe what we're dealing with here is no metav1 error but a more generic one: https://github.com/kubernetes/client-go/blob/7d04d0e2a0a1a4d4a1cd6baa432a2301492e4e65/tools/cache/reflector.go#L251
While it stings a bit, I can live with allowing CRD listing. My initial issue was with list access to everything in the cluster, which has been resolved thanks to you.
Perhaps documentation could be added with the minimum privileges Flux needs in order to operate properly, though I suspect that be complicated with helm and GC. Maybe a more restricted minimal example next to deploy?
On a positive note, at least it is not silently firing a request every second that may add up for each instance you run ;)
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/weaveworks/flux/issues/1830#issuecomment-474113083, or mute the thread https://github.com/notifications/unsubscribe-auth/ACQOJAtebOUuSS-4-nR9ZRSwjKfPOgvyks5vYAhfgaJpZM4b0c3f .
I run flux with explicit permissions, as limited as possible and with only a single namespaced
Role
and--k8s-namespace-whitelist
set. After upgrading to 1.11.0 it no longer syncs unless it is able to list virtually everything in the cluster.This is the
ClusterRole
I created from sync-loop errors before it was able to sync again. You can tell where I gave up:The FAQ answers "Can I restrict the namespaces that Flux can see" with "yes, experimental". Sadly, this is no longer the case.
Also name dropping https://github.com/weaveworks/flux/issues/1217 and https://github.com/weaveworks/flux/issues/1471