cscetbon / casskop

This Kubernetes operator automates Cassandra operations such as deploying rack aware clusters, scaling up and down, configuring C* and its JVM, upgrading JVM and C*, backup/restores and many more...
https://cscetbon.github.io/casskop/
Apache License 2.0
13 stars 8 forks source link

In GKE cassandra operator is missing permissions to list Nodes #30

Closed szymonrychu closed 2 years ago

szymonrychu commented 2 years ago

Bug Report

What did you do? I've tried to install the Cassandra operator in GKE.

What did you expect to see? It should just work among all clouds.

What did you see instead? Under which circumstances? It broke with log as follows:

2022-03-31T04:50:08.967Z    ERROR   leader  Failed to get Node  {"Node.Name": "<redacted>-system-f2e9a7e0-56u7", "error": "nodes \"<redacted>-system-f2e9a7e0-56u7\" is forbidden: User \"system:serviceaccount:<redacted>:cassandra-operator\" cannot get resource \"nodes\" in API group \"\" at the cluster scope"}
github.com/operator-framework/operator-lib/leader.isNotReadyNode
    /casskop/vendor/github.com/operator-framework/operator-lib/leader/leader.go:277
github.com/operator-framework/operator-lib/leader.Become
    /casskop/vendor/github.com/operator-framework/operator-lib/leader/leader.go:182
main.main
    /casskop/main.go:145
runtime.main
    /usr/local/go/src/runtime/proc.go:255

The issue seems to be isolated only to GKE. In Azure, it runs perfectly as is. Anyway IMHO if the operator needs to list nodes and there are no related permissions for it, it's definitely a bug here, not in the cloud (and plain luck, that it's working everywhere else).

Environment

2.1.0-release

Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.9-gke.1001", GitCommit:"35dafe6010950b2aa1b3733e912f5828b58e8a02", GitTreeState:"clean", BuildDate:"2022-02-18T05:02:26Z", GoVersion:"go1.16.12b7", Compiler:"gc", Platform:"linux/amd64"}

2.1.0-release

Possible Solution

Add clusterrole and clusterrolebinding (will create PR for it quickly).

cscetbon commented 2 years ago

@szymonrychu did you check out https://cscetbon.github.io/casskop/docs/setup/platform_setup/gke ?

szymonrychu commented 2 years ago

No, I didn't to be honest. At the same time, the mentioned logic seems not to be exclusive to GKE only - see there: https://github.com/cscetbon/casskop/blob/master/main.go#L145 https://github.com/operator-framework/operator-lib/blob/main/leader/leader.go#L182 https://github.com/operator-framework/operator-lib/blob/main/leader/leader.go#L277 the logic behind it is convoluted, right. That's why it was working most of the time.

cscetbon commented 2 years ago

Nope but those perms might be inherited/available by default except in GKE where they're locked for some reason. We use k3d to run our e2e tests and we don't need to do anything for it to run as expected. @erdrix maybe you can add your 2 cents here ?

szymonrychu commented 2 years ago

Did you hit this cornercase anytime in the past? I mean did you saw in the logs that operator lists cluster nodes successfully?

I mean it's quite bezzaire one (hidden behind 2 switches), so it's quite accomplishment to get to it ^^ At the same time the logic is there and without these permissions chart feels incomplete to me :)