StarRocks / starrocks-kubernetes-operator

Kubernetes Operator for StarRocks
Apache License 2.0
136 stars 68 forks source link

StarRocks Operator on Red Hat Openshift. SCC issues. Trying to create new service account and granting permissions. #108

Closed alberttwong closed 1 year ago

alberttwong commented 1 year ago

How did we get here. https://github.com/StarRocks/starrocks/discussions/22767

executed

oc create sa starrocks-sa
oc adm policy add-scc-to-user privileged -z starrocks-sa
oc set sa deploy starrocks-controller starrocks-sa

now I get this error in the starrocks-controller pod

E0428 23:37:25.428177       1 leaderelection.go:330] error retrieving resource lock starrocks/c6c79638.starrocks.com: leases.coordination.k8s.io "c6c79638.starrocks.com" is forbidden: User "system:serviceaccount:starrocks:starrocks-sa" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "starrocks"
alberttwong commented 1 year ago

This may be the answer. https://access.redhat.com/solutions/6973378

alberttwong commented 1 year ago

oc edit roles/starrocks-leader-election-role

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"Role","metadata":{"annotations":{},"name":"starrocks-leader-election-role","namespace":"starrocks"},"rules":[{"apiGroups":[""],"resources":["configmaps"],"verbs":["get","list","watch","create","update","patch","delete"]},{"apiGroups":["coordination.k8s.io"],"resources":["leases"],"verbs":["get","list","watch","create","update","patch","delete"]},{"apiGroups":[""],"resources":["events"],"verbs":["create","patch"]}]}
  creationTimestamp: "2023-04-28T22:21:11Z"
  name: starrocks-leader-election-role
  namespace: starrocks
  resourceVersion: "41334"
  uid: 0cbad80c-20b7-4ca0-8351-cea2cb632c81
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch
  - delete
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch

It has the correct verbs.

alberttwong commented 1 year ago

since I created a new SA, starrocks-sa, I applied the following yaml

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: ' starrocks-leader-election-role'
  namespace: starrocks
  uid: 460e759c-5f1b-4c16-b3ac-3a146e5d100e
  resourceVersion: '65355'
  creationTimestamp: '2023-04-28T23:39:43Z'
  managedFields:
    - manager: Mozilla
      operation: Update
      apiVersion: rbac.authorization.k8s.io/v1
      time: '2023-04-28T23:39:43Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:roleRef': {}
        'f:subjects': {}
subjects:
  - kind: ServiceAccount
    name: starrocks-sa
    namespace: starrocks
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: starrocks-leader-election-role
alberttwong commented 1 year ago

Now I'm getting a different error related to cluster roles.

I0428 23:42:03.698913       1 request.go:601] Waited for 1.04014614s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1?timeout=32s
1.6827253250015438e+09  INFO    controller-runtime.metrics  Metrics server is starting to listen    {"addr": ":8080"}
1.682725325002093e+09   INFO    setup   starting manager
I0428 23:42:05.002386       1 leaderelection.go:248] attempting to acquire leader lease starrocks/c6c79638.starrocks.com...
1.6827253250023947e+09  INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
1.682725325002437e+09   INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
I0428 23:42:22.353634       1 leaderelection.go:258] successfully acquired lease starrocks/c6c79638.starrocks.com
1.6827253423537867e+09  INFO    Starting EventSource    {"controller": "starrockscluster", "controllerGroup": "starrocks.com", "controllerKind": "StarRocksCluster", "source": "kind source: *v1.StarRocksCluster"}
1.682725342353757e+09   DEBUG   events  starrocks-controller-cf78b5cb-lz6hp_708fb4fa-55ee-4c23-852a-bff29f983aed became leader  {"type": "Normal", "object": {"kind":"Lease","namespace":"starrocks","name":"c6c79638.starrocks.com","uid":"6ccb58cd-43c3-4e8e-a7ae-9a273c98b91c","apiVersion":"coordination.k8s.io/v1","resourceVersion":"66212"}, "reason": "LeaderElection"}
1.682725342353829e+09   INFO    Starting EventSource    {"controller": "starrockscluster", "controllerGroup": "starrocks.com", "controllerKind": "StarRocksCluster", "source": "kind source: *v1.StatefulSet"}
1.6827253423538373e+09  INFO    Starting EventSource    {"controller": "starrockscluster", "controllerGroup": "starrocks.com", "controllerKind": "StarRocksCluster", "source": "kind source: *v1.Service"}
1.6827253423538406e+09  INFO    Starting Controller {"controller": "starrockscluster", "controllerGroup": "starrocks.com", "controllerKind": "StarRocksCluster"}
W0428 23:42:22.355509       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.0/tools/cache/reflector.go:169: failed to list *v1.Service: services is forbidden: User "system:serviceaccount:starrocks:starrocks-sa" cannot list resource "services" in API group "" at the cluster scope
W0428 23:42:22.355513       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.0/tools/cache/reflector.go:169: failed to list *v1.StatefulSet: statefulsets.apps is forbidden: User "system:serviceaccount:starrocks:starrocks-sa" cannot list resource "statefulsets" in API group "apps" at the cluster scope
E0428 23:42:22.355554       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.0/tools/cache/reflector.go:169: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User "system:serviceaccount:starrocks:starrocks-sa" cannot list resource "services" in API group "" at the cluster scope
E0428 23:42:22.355556       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.0/tools/cache/reflector.go:169: Failed to watch *v1.StatefulSet: failed to list *v1.StatefulSet: statefulsets.apps is forbidden: User "system:serviceaccount:starrocks:starrocks-sa" cannot list resource "statefulsets" in API group "apps" at the cluster scope
W0428 23:42:22.355944       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.0/tools/cache/reflector.go:169: failed to list *v1.StarRocksCluster: starrocksclusters.starrocks.com is forbidden: User "system:serviceaccount:starrocks:starrocks-sa" cannot list resource "starrocksclusters" in API group "starrocks.com" at the cluster scope
E0428 23:42:22.355965       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.0/tools/cache/reflector.go:169: Failed to watch *v1.StarRocksCluster: failed to list *v1.StarRocksCluster: starrocksclusters.starrocks.com is forbidden: User "system:serviceaccount:starrocks:starrocks-sa" cannot list resource "starrocksclusters" in API group "starrocks.com" at the cluster scope
W0428 23:42:23.273141       1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.0/tools/cache/reflector.go:169: failed to list *v1.Service: services is forbidden: User "system:serviceaccount:starrocks:starrocks-sa" cannot list resource "services" in API group "" at the cluster scope
alberttwong commented 1 year ago

since I created a new SA, starrocks-sa, I applied the following yaml to grant cluster role

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: starrocks-manager
  managedFields:
    - manager: Mozilla
      operation: Update
      apiVersion: rbac.authorization.k8s.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:roleRef': {}
        'f:subjects': {}
subjects:
  - kind: ServiceAccount
    name: starrocks-sa
    namespace: starrocks
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: starrocks-manager
alberttwong commented 1 year ago

now I'm getting the following error.

I0428 23:46:56.066560       1 request.go:601] Waited for 1.028188158s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/k8s.cni.cncf.io/v1?timeout=32s
1.6827256173696938e+09  INFO    controller-runtime.metrics  Metrics server is starting to listen    {"addr": ":8080"}
1.6827256173699167e+09  INFO    setup   starting manager
I0428 23:46:57.370185       1 leaderelection.go:248] attempting to acquire leader lease starrocks/c6c79638.starrocks.com...
1.6827256173701913e+09  INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
1.6827256173702197e+09  INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
I0428 23:47:12.663146       1 leaderelection.go:258] successfully acquired lease starrocks/c6c79638.starrocks.com
1.6827256326632936e+09  INFO    Starting EventSource    {"controller": "starrockscluster", "controllerGroup": "starrocks.com", "controllerKind": "StarRocksCluster", "source": "kind source: *v1.StarRocksCluster"}
1.6827256326633294e+09  INFO    Starting EventSource    {"controller": "starrockscluster", "controllerGroup": "starrocks.com", "controllerKind": "StarRocksCluster", "source": "kind source: *v1.StatefulSet"}
1.6827256326633346e+09  INFO    Starting EventSource    {"controller": "starrockscluster", "controllerGroup": "starrocks.com", "controllerKind": "StarRocksCluster", "source": "kind source: *v1.Service"}
1.6827256326633377e+09  INFO    Starting Controller {"controller": "starrockscluster", "controllerGroup": "starrocks.com", "controllerKind": "StarRocksCluster"}
1.6827256326632776e+09  DEBUG   events  starrocks-controller-cf78b5cb-4zxmp_12b9357e-b47c-4624-a44e-e81b2d2ae64d became leader  {"type": "Normal", "object": {"kind":"Lease","namespace":"starrocks","name":"c6c79638.starrocks.com","uid":"6ccb58cd-43c3-4e8e-a7ae-9a273c98b91c","apiVersion":"coordination.k8s.io/v1","resourceVersion":"67848"}, "reason": "LeaderElection"}
1.6827256327643864e+09  INFO    Starting workers    {"controller": "starrockscluster", "controllerGroup": "starrocks.com", "controllerKind": "StarRocksCluster", "worker count": 1}
I0428 23:47:12.764483       1 starrockscluster_controller.go:85] StarRocksClusterReconciler reconcile the update crd name starrockscluster-sample namespace starrocks
I0428 23:47:12.765009       1 statefulset.go:135] the statefulset name starrockscluster-sample-fe new hash value 3203758280 old have value 3203758280
I0428 23:47:12.765041       1 k8sutils.go:77] ApplyStatefulSEt Sync exist statefulset name=starrockscluster-sample-fe, namespace=starrocks, equals to new statefuslet.
I0428 23:47:12.765066       1 k8sutils.go:52] CreateOrUpdateService service Name, Ports, Selector, ServiceType, Labels have not change namespace starrocks name starrockscluster-sample-fe-search
I0428 23:47:12.765085       1 k8sutils.go:52] CreateOrUpdateService service Name, Ports, Selector, ServiceType, Labels have not change namespace starrocks name starrockscluster-sample-fe-service
I0428 23:47:12.966937       1 be_controller.go:156] BeController UpdateStatus the statefulset name=starrockscluster-sample-be is not found.
shileifu commented 1 year ago

so, now what is the problem with operator crd yamls. I can't get your point. you can communicate with Kevin.cai

alberttwong commented 1 year ago

stateful set has permission issues.

starrocks                              23s         Warning   FailedCreate                        statefulset/starrockscluster-sample-fe                                  create Pod starrockscluster-sample-fe-0 in StatefulSet starrockscluster-sample-fe failed error: pods "starrockscluster-sample-fe-0" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted: .spec.securityContext.fsGroup: Invalid value: []int64{1000}: 1000 is not an allowed group, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]
alberttwong commented 1 year ago

trying to change service account

atwong@Alberts-MBP ~ % oc set sa deploy starrockscluster-sample-fe starrocks-sa
Error from server (NotFound): deployments.apps "starrockscluster-sample-fe" not found
alberttwong commented 1 year ago

I don't see anything odd in the statefulset yaml

kind: StatefulSet
apiVersion: apps/v1
metadata:
  annotations:
    app.starrocks.components/hash: '3203758280'
  resourceVersion: '100795'
  name: starrockscluster-sample-fe
  uid: 8994ad02-97ed-45df-ac74-a2d541dfcdd8
  creationTimestamp: '2023-05-05T23:50:02Z'
  generation: 1
  managedFields:
    - manager: sroperator
      operation: Update
      apiVersion: apps/v1
      time: '2023-05-05T23:50:02Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:app.starrocks.components/hash': {}
          'f:finalizers':
            .: {}
            'v:"starrocks.com.statefulset/protection"': {}
          'f:labels':
            .: {}
            'f:app.kubernetes.io/component': {}
            'f:app.starrocks.ownerreference/name': {}
          'f:ownerReferences':
            .: {}
            'k:{"uid":"7b92fb2d-cd76-4336-ad1e-9c6afc4d6ba1"}': {}
        'f:spec':
          'f:podManagementPolicy': {}
          'f:replicas': {}
          'f:revisionHistoryLimit': {}
          'f:selector': {}
          'f:serviceName': {}
          'f:template':
            'f:metadata':
              'f:labels':
                .: {}
                'f:app.kubernetes.io/component': {}
                'f:app.starrocks.ownerreference/name': {}
              'f:name': {}
              'f:namespace': {}
            'f:spec':
              'f:containers':
                'k:{"name":"fe"}':
                  'f:image': {}
                  'f:startupProbe':
                    .: {}
                    'f:failureThreshold': {}
                    'f:periodSeconds': {}
                    'f:successThreshold': {}
                    'f:tcpSocket':
                      .: {}
                      'f:port': {}
                    'f:timeoutSeconds': {}
                  'f:volumeMounts':
                    .: {}
                    'k:{"mountPath":"/opt/starrocks/fe/log"}':
                      .: {}
                      'f:mountPath': {}
                      'f:name': {}
                    'k:{"mountPath":"/opt/starrocks/fe/meta"}':
                      .: {}
                      'f:mountPath': {}
                      'f:name': {}
                  'f:terminationMessagePolicy': {}
                  .: {}
                  'f:resources':
                    .: {}
                    'f:requests':
                      .: {}
                      'f:cpu': {}
                      'f:memory': {}
                  'f:args': {}
                  'f:lifecycle':
                    .: {}
                    'f:preStop':
                      .: {}
                      'f:exec':
                        .: {}
                        'f:command': {}
                  'f:command': {}
                  'f:livenessProbe':
                    .: {}
                    'f:failureThreshold': {}
                    'f:periodSeconds': {}
                    'f:successThreshold': {}
                    'f:tcpSocket':
                      .: {}
                      'f:port': {}
                    'f:timeoutSeconds': {}
                  'f:env':
                    'k:{"name":"HOST_TYPE"}':
                      .: {}
                      'f:name': {}
                      'f:value': {}
                    'k:{"name":"POD_IP"}':
                      .: {}
                      'f:name': {}
                      'f:valueFrom':
                        .: {}
                        'f:fieldRef': {}
                    'k:{"name":"FE_SERVICE_NAME"}':
                      .: {}
                      'f:name': {}
                      'f:value': {}
                    'k:{"name":"POD_NAME"}':
                      .: {}
                      'f:name': {}
                      'f:valueFrom':
                        .: {}
                        'f:fieldRef': {}
                    .: {}
                    'k:{"name":"USER"}':
                      .: {}
                      'f:name': {}
                      'f:value': {}
                    'k:{"name":"HOST_IP"}':
                      .: {}
                      'f:name': {}
                      'f:valueFrom':
                        .: {}
                        'f:fieldRef': {}
                    'k:{"name":"COMPONENT_NAME"}':
                      .: {}
                      'f:name': {}
                      'f:value': {}
                    'k:{"name":"POD_NAMESPACE"}':
                      .: {}
                      'f:name': {}
                      'f:valueFrom':
                        .: {}
                        'f:fieldRef': {}
                  'f:readinessProbe':
                    .: {}
                    'f:failureThreshold': {}
                    'f:periodSeconds': {}
                    'f:successThreshold': {}
                    'f:tcpSocket':
                      .: {}
                      'f:port': {}
                    'f:timeoutSeconds': {}
                  'f:terminationMessagePath': {}
                  'f:imagePullPolicy': {}
                  'f:ports':
                    .: {}
                    'k:{"containerPort":8030,"protocol":"TCP"}':
                      .: {}
                      'f:containerPort': {}
                      'f:name': {}
                      'f:protocol': {}
                    'k:{"containerPort":9020,"protocol":"TCP"}':
                      .: {}
                      'f:containerPort': {}
                      'f:name': {}
                      'f:protocol': {}
                    'k:{"containerPort":9030,"protocol":"TCP"}':
                      .: {}
                      'f:containerPort': {}
                      'f:name': {}
                      'f:protocol': {}
                  'f:name': {}
              'f:dnsPolicy': {}
              'f:restartPolicy': {}
              'f:schedulerName': {}
              'f:securityContext':
                .: {}
                'f:fsGroup': {}
                'f:fsGroupChangePolicy': {}
              'f:terminationGracePeriodSeconds': {}
              'f:volumes':
                .: {}
                'k:{"name":"fe-log"}':
                  .: {}
                  'f:emptyDir': {}
                  'f:name': {}
                'k:{"name":"fe-meta"}':
                  .: {}
                  'f:emptyDir': {}
                  'f:name': {}
          'f:updateStrategy':
            'f:rollingUpdate':
              .: {}
              'f:partition': {}
            'f:type': {}
  namespace: starrocks
  ownerReferences:
    - apiVersion: starrocks.com/v1
      kind: StarRocksCluster
      name: starrockscluster-sample
      uid: 7b92fb2d-cd76-4336-ad1e-9c6afc4d6ba1
      controller: true
      blockOwnerDeletion: true
  finalizers:
    - starrocks.com.statefulset/protection
  labels:
    app.kubernetes.io/component: fe
    app.starrocks.ownerreference/name: starrockscluster-sample
spec:
  replicas: 3
  selector:
    matchLabels:
      app.kubernetes.io/component: fe
      app.starrocks.ownerreference/name: starrockscluster-sample-fe
  template:
    metadata:
      name: starrockscluster-sample-fe
      namespace: starrocks
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: fe
        app.starrocks.ownerreference/name: starrockscluster-sample-fe
    spec:
      volumes:
        - name: fe-meta
          emptyDir: {}
        - name: fe-log
          emptyDir: {}
      containers:
        - resources:
            requests:
              cpu: '4'
              memory: 16Gi
          readinessProbe:
            tcpSocket:
              port: 9030
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          lifecycle:
            preStop:
              exec:
                command:
                  - /opt/starrocks/fe_prestop.sh
          name: fe
          command:
            - /opt/starrocks/fe_entrypoint.sh
          livenessProbe:
            tcpSocket:
              port: 9020
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            - name: COMPONENT_NAME
              value: fe
            - name: FE_SERVICE_NAME
              value: starrockscluster-sample-fe-service.starrocks
            - name: POD_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
            - name: HOST_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.hostIP
            - name: HOST_TYPE
              value: FQDN
            - name: USER
              value: root
          ports:
            - name: http-port
              containerPort: 8030
              protocol: TCP
            - name: rpc-port
              containerPort: 9020
              protocol: TCP
            - name: query-port
              containerPort: 9030
              protocol: TCP
          imagePullPolicy: IfNotPresent
          startupProbe:
            tcpSocket:
              port: 8030
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 60
          volumeMounts:
            - name: fe-meta
              mountPath: /opt/starrocks/fe/meta
            - name: fe-log
              mountPath: /opt/starrocks/fe/log
          terminationMessagePolicy: File
          image: 'starrocks/fe-ubuntu:2.5.4'
          args:
            - $(FE_SERVICE_NAME)
      restartPolicy: Always
      terminationGracePeriodSeconds: 120
      dnsPolicy: ClusterFirst
      securityContext:
        fsGroup: 1000
        fsGroupChangePolicy: OnRootMismatch
      schedulerName: default-scheduler
  serviceName: starrockscluster-sample-fe-search
  podManagementPolicy: Parallel
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      partition: 0
  revisionHistoryLimit: 10
status:
  replicas: 0
alberttwong commented 1 year ago
atwong@Alberts-MBP ~ % oc adm policy add-scc-to-user privileged -z starrocks-sa
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:privileged added: "starrocks-sa"
kevincai commented 1 year ago

@alberttwong did you get through this now?

@dengliu is able to bring up the starrocks cluster on openshift with a few workaround. check the following issue for detailed info: https://github.com/StarRocks/starrocks-kubernetes-operator/issues/120

alberttwong commented 1 year ago

@kevincai I haven't been able to get it up with any instruction. @dengliu actually isn't using the operator but doing a helm chart install.

kevincai commented 1 year ago

please wait for our next release, we will refine the finalizer design so you won't get so many troubles.

alberttwong commented 1 year ago

I tried a bunch of different methods. The only way I could work was with the helm chart. I couldn't get the operator to work on openshfit.