apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.31k stars 3.21k forks source link

[Bug] k8s deploy, but doris-follower-cluster1-0 can not start #24292

Open hillbun opened 1 year ago

hillbun commented 1 year ago

Search before asking

Version

https://github.com/apache/doris/blob/master/docker/runtime/k8s/doris_follower.yml

I change to this image version

image: apache/doris:2.0.0_alpha-fe-x86_64

What's Wrong?

kubectl logs doris-follower-cluster1-0 
2023-09-13T06:52:08+00:00 [Warn] [Entrypoint]: BUILD_TYPE k8s
=====
<?xml version="1.0" encoding="utf-8"?>

<!-- Auto Generated. DO NOT MODIFY IT! -->
<Configuration status="info" packages="org.apache.doris.common">
  <Appenders>
    <Console name="Console" target="SYSTEM_OUT">      <PatternLayout charset="UTF-8">
        <Pattern>%d{yyyy-MM-dd HH:mm:ss,SSS} %p (%t|%tid) [%C{1}.%M():%L] %m%n</Pattern>
      </PatternLayout>
    </Console>    <RollingFile name="Sys" fileName="/opt/apache-doris/fe/log/fe.log" filePattern="/opt/apache-doris/fe/log/fe.log.%d{yyyyMMdd}-%i">
      <PatternLayout charset="UTF-8">
        <Pattern>%d{yyyy-MM-dd HH:mm:ss,SSS} %p (%t|%tid) [%C{1}.%M():%L] %m%n</Pattern>
      </PatternLayout>
      <Policies>
        <TimeBasedTriggeringPolicy/>
        <SizeBasedTriggeringPolicy size="1024MB"/>
      </Policies>
      <DefaultRolloverStrategy max="10" fileIndex="min">
        <Delete basePath="/opt/apache-doris/fe/log/" maxDepth="1">
          <IfFileName glob="fe.log.*" />
          <IfLastModified age="7d" />
        </Delete>
      </DefaultRolloverStrategy>
    </RollingFile>
    <RollingFile name="SysWF" fileName="/opt/apache-doris/fe/log/fe.warn.log" filePattern="/opt/apache-doris/fe/log/fe.warn.log.%d{yyyyMMdd}-%i">
      <PatternLayout charset="UTF-8">
        <Pattern>%d{yyyy-MM-dd HH:mm:ss,SSS} %p (%t|%tid) [%C{1}.%M():%L] %m%n</Pattern>
      </PatternLayout>
      <Policies>
        <TimeBasedTriggeringPolicy/>
        <SizeBasedTriggeringPolicy size="1024MB"/>
      </Policies>
      <DefaultRolloverStrategy max="10" fileIndex="min">
        <Delete basePath="/opt/apache-doris/fe/log/" maxDepth="1">
          <IfFileName glob="fe.warn.log.*" />
          <IfLastModified age="7d" />
        </Delete>
      </DefaultRolloverStrategy>
    </RollingFile>
    <RollingFile name="Auditfile" fileName="/opt/apache-doris/fe/log/fe.audit.log" filePattern="/opt/apache-doris/fe/log/fe.audit.log.%d{yyyyMMdd}-%i">
      <PatternLayout charset="UTF-8">
        <Pattern>%d{yyyy-MM-dd HH:mm:ss,SSS} [%c{1}] %m%n</Pattern>
      </PatternLayout>
      <Policies>
        <TimeBasedTriggeringPolicy/>
        <SizeBasedTriggeringPolicy size="1024MB"/>
      </Policies>
      <DefaultRolloverStrategy max="10" fileIndex="min">
        <Delete basePath="/opt/apache-doris/fe/log/" maxDepth="1">
          <IfFileName glob="fe.audit.log.*" />
          <IfLastModified age="30d" />
        </Delete>
      </DefaultRolloverStrategy>
    </RollingFile>
  </Appenders>
  <Loggers>
    <Root level="INFO">
      <AppenderRef ref="Sys"/>
      <AppenderRef ref="SysWF" level="WARN"/>
      <AppenderRef ref="Console"/>

    </Root>
    <Logger name="audit" level="ERROR" additivity="false">
      <AppenderRef ref="Auditfile"/>
    </Logger>
    <Logger name='audit.slow_query' level='INFO'/><Logger name='audit.query' level='INFO'/><Logger name='audit.load' level='INFO'/><Logger name='audit.stream_load' level='INFO'/>
  </Loggers>
</Configuration>
=====
==============================
2023-09-13 06:52:11,607 INFO (main|1) [DorisFE.start():124] Doris FE starting...
2023-09-13 06:52:11,620 INFO (main|1) [FrontendOptions.analyzePriorityCidrs():121] configured prior_cidrs value: 172.16.0.0/24
2023-09-13 06:52:11,629 INFO (main|1) [FrontendOptions.init():68] check ip address: /fe80:0:0:0:5254:0:269:5fdc%eth0
2023-09-13 06:52:11,630 INFO (main|1) [FrontendOptions.init():68] check ip address: /10.0.7.208
2023-09-13 06:52:11,630 INFO (main|1) [FrontendOptions.init():68] check ip address: /0:0:0:0:0:0:0:1%lo
2023-09-13 06:52:11,630 INFO (main|1) [FrontendOptions.init():68] check ip address: /127.0.0.1
2023-09-13 06:52:11,650 INFO (main|1) [FrontendOptions.init():88] local address: localhost/127.0.0.1.
2023-09-13 06:52:11,871 INFO (main|1) [ConsistencyChecker.initWorkTime():106] consistency checker will work from 15:00 to 15:00
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-doris/fe/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/apache-doris/fe/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/apache-doris/fe/lib/hive-catalog-shade-1.0.3-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
2023-09-13 06:52:13,084 INFO (main|1) [PrivTable.addEntry():73] add priv entry: Node_priv Admin_priv 
2023-09-13 06:52:13,084 INFO (main|1) [PrivTable.addEntry():73] add priv entry: Admin_priv 
2023-09-13 06:52:13,145 INFO (main|1) [PrivTable.addEntry():73] add priv entry: database privilege.ctl: internal, db: default_cluster:information_schema, priv: Select_priv 
2023-09-13 06:52:13,145 INFO (main|1) [Auth.createUserInternal():469] finished to create user: 'root'@'%', is replay: true
2023-09-13 06:52:13,146 INFO (main|1) [PrivTable.addEntry():73] add priv entry: database privilege.ctl: internal, db: default_cluster:information_schema, priv: Select_priv 
2023-09-13 06:52:13,146 INFO (main|1) [Auth.createUserInternal():469] finished to create user: 'admin'@'%', is replay: true
2023-09-13 06:52:13,293 INFO (main|1) [DeployManager.initEnvVariables():169] get deploy env: FE_SERVICE, FE_OBSERVER_SERVICE, BE_SERVICE, BROKER_SERVICE, CN_SERVICE
2023-09-13 06:52:13,293 INFO (main|1) [DeployManager.initEnvVariables():174] Electable service group is found
2023-09-13 06:52:13,293 INFO (main|1) [DeployManager.initEnvVariables():188] Backend service group is found
2023-09-13 06:52:13,293 INFO (main|1) [DeployManager.initEnvVariables():202] Cn service group is found
2023-09-13 06:52:13,293 INFO (main|1) [DeployManager.initEnvVariables():207] get electableFeServiceGroup: doris-follower-cluster1, observerFeServiceGroup: , backendServiceGroup: doris-be-cluster1 brokerServiceGroup: , cnServiceGroup: doris-cn-cluster1
2023-09-13 06:52:13,294 INFO (main|1) [K8sDeployManager.initEnvVariables():112] use namespace: default
2023-09-13 06:52:13,294 INFO (main|1) [K8sDeployManager.initEnvVariables():120] use domainLTD: svc.cluster.local
[INFO] Env name of: {} is: {}ELECTABLEFE_STATEFULSET
2023-09-13 06:52:13,296 INFO (main|1) [K8sDeployManager.initEnvVariables():134] use statefulSetName: ELECTABLE, doris-follower-cluster1
[INFO] Env name of: {} is: {}BACKENDBE_STATEFULSET
2023-09-13 06:52:13,296 INFO (main|1) [K8sDeployManager.initEnvVariables():134] use statefulSetName: BACKEND, doris-be-cluster1
[INFO] Env name of: {} is: {}BACKEND_CNCN_STATEFULSET
2023-09-13 06:52:13,296 INFO (main|1) [K8sDeployManager.initEnvVariables():134] use statefulSetName: BACKEND_CN, doris-cn-cluster1
2023-09-13 06:52:13,297 INFO (main|1) [DeployManager.getHelperNodes():287] get init num of fe from env: 3
log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2023-09-13 06:52:15,024 WARN (main|1) [K8sDeployManager.statefulSet():267] encounter exception when get statefulSet from namespace default, statefulSet: doris-follower-cluster1
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://192.168.0.1/apis/apps/v1/namespaces/default/statefulsets/doris-follower-cluster1. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. statefulsets.apps "doris-follower-cluster1" is forbidden: User "system:serviceaccount:default:default" cannot get resource "statefulsets" in API group "apps" in the namespace "default".
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682) ~[kubernetes-client-5.12.2.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661) ~[kubernetes-client-5.12.2.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:610) ~[kubernetes-client-5.12.2.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555) ~[kubernetes-client-5.12.2.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518) ~[kubernetes-client-5.12.2.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:487) ~[kubernetes-client-5.12.2.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:457) ~[kubernetes-client-5.12.2.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:698) ~[kubernetes-client-5.12.2.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:184) ~[kubernetes-client-5.12.2.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:151) ~[kubernetes-client-5.12.2.jar:?]
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:83) ~[kubernetes-client-5.12.2.jar:?]
        at org.apache.doris.deploy.impl.K8sDeployManager.statefulSet(K8sDeployManager.java:265) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.deploy.impl.K8sDeployManager.getGroupHostInfosByStatefulSet(K8sDeployManager.java:177) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.deploy.impl.K8sDeployManager.getGroupHostInfos(K8sDeployManager.java:167) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.deploy.DeployManager.getHelperNodes(DeployManager.java:294) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.getHelperNodeFromDeployManager(Env.java:1257) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.getHelperNodes(Env.java:1203) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.initialize(Env.java:832) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.DorisFE.start(DorisFE.java:146) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.DorisFE.main(DorisFE.java:73) ~[doris-fe.jar:1.2-SNAPSHOT]
2023-09-13 06:52:15,031 WARN (main|1) [K8sDeployManager.getGroupHostInfosByStatefulSet():179] get null statefulSet in namespace default, statefulSetName: doris-follower-cluster1

What You Expected?

doris-follower-cluster1-0 should start without errors

How to Reproduce?

No response

Anything Else?

doris_follower.yml

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.kubernetes.io/loadbalance-id: lb-h8dq7jk4
    service.kubernetes.io/qcloud-loadbalancer-internal-subnetid: subnet-ezvuf44o
    service.kubernetes.io/qcloud-share-existed-lb: "true"
    service.kubernetes.io/tke-existed-lbid: lb-h8dq7jk4
  name: doris-follower-cluster1
  labels:
    app: doris-follower-cluster1
spec:
  type: LoadBalancer
  ports:
    - port: 8030
      name: http-port
    - port: 9020
      name: rpc-port
    - port: 9030
      name: query-port
    - port: 9010
      name: edit-log-port #This name should be fixed. Doris will get the port information through this name
  #clusterIP: None
  selector:
    app: doris-follower-cluster1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: doris-follower-cluster1
  labels:
    app: doris-follower-cluster1
spec:
  selector:
    matchLabels:
      app: doris-follower-cluster1
  serviceName: doris-follower-cluster1
  replicas: 3
  template:
    metadata:
      name: doris-follower-cluster1
      labels:
        app: doris-follower-cluster1
    spec:
      containers:
        - name: doris-follower-cluster1
          #Need to change to real mirror information
          #image: apache-doris-fe:test
          image: apache/doris:2.0.0_alpha-fe-x86_64
          imagePullPolicy: IfNotPresent
          env:
            #Specify the startup type as k8s to bypass some restrictions of the official image initialization script
            - name: BUILD_TYPE
              value: "k8s"
            #Initialize the fe of three nodes
            - name: FE_INIT_NUMBER
              value: "3"
            #ServiceName of bakend_cn node,(if do not have bakend_cn node,do not configure this environment variable)
            - name: CN_SERVICE
              value: "doris-cn-cluster1"
            #StatefulSetName of bakend_cn node,(if do not have bakend_cn node,do not configure this environment variable)
            - name: CN_STATEFULSET
              value: "doris-cn-cluster1"
            #ServiceName of bakend node,(if do not have bakend node,do not configure this environment variable)
            - name: BE_SERVICE
              value: "doris-be-cluster1"
            #StatefulSetName of bakend node,(if do not have bakend node,do not configure this environment variable)
            - name: BE_STATEFULSET
              value: "doris-be-cluster1"
            #ServiceName of follower node,(if do not have follower node,do not configure this environment variable)
            - name: FE_SERVICE
              value: "doris-follower-cluster1"
            ##StatefulSetName of follower node,(if do not have follower node,do not configure this environment variable)
            - name: FE_STATEFULSET
              value: "doris-follower-cluster1"
          ports:
            - containerPort: 8030
              name: http-port
            - containerPort: 9020
              name: rpc-port
            - containerPort: 9030
              name: query-port
            - containerPort: 9010
              name: edit-log-port
          volumeMounts:
            #Mount the configuration file in the way of configmap
            - name: conf
              mountPath: /opt/apache-doris/fe/conf
              #In order to call the api of k8s
            - name: kube
              mountPath: /root/.kube/config
              readOnly: true
      volumes:
        - name: conf
          configMap:
            name: follower-conf
        - name: kube
          hostPath:
            path: /root/.kube/config
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: follower-conf
data:
  fe.conf: |
    priority_networks = 172.16.0.0/24
    #It can automatically maintain node information by getting the number of replicas of StatefulSet, similar to alter system add/drop back
    enable_deploy_manager = k8s
    #Automatically adjust the IP of the node according to the domain name (for example, after the pod is restarted, the domain name is still doris-be-cluster1-0-doris-be-cluster1.default.svc.cluster.local, but the IP may change from 172.16.0.9 to 172.16.0.10)
    enable_fqdn_mode = true
    LOG_DIR = ${DORIS_HOME}/log
    sys_log_level = INFO
    http_port = 8030
    rpc_port = 9020
    query_port = 9030
    edit_log_port = 9010
    #Doris needs to generate the log4j configuration file according to the fe.yml configuration information, which is written in the same directory as fe.yml by default, but the config we mount is readonly, so specify this configuration to write the log4j file to another location
    custom_config_dir = /opt/apache-doris/
    #when set to false, the backend will not be dropped and remaining in DECOMMISSION state
    drop_backend_after_decommission = false

Are you willing to submit PR?

Code of Conduct

xiaolin84250 commented 1 year ago

This is because you have not created a user relative to the k8s cluster.

Execute the following two commands and redeploy the yml file to solve the problem.

kubectl create serviceaccount -n default default

kubectl create clusterrolebinding doris-cluster-admin-binding --clusterrole=cluster-admin --serviceaccount=default:default
xiaolin84250 commented 1 year ago

If it is solved, please reply and close the issue.

Jorgevillada commented 11 months ago

if you dont want to use a cluster-admin role, you can use this role and role-binding.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: rc-apache-doris-fe
rules:
  - apiGroups:
      - ""
      - apps
    resources:
      - statefulsets
      - services
    verbs:
      - "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: rc-apache-doris-fe
subjects:
  - kind: ServiceAccount
    name: rc-apache-doris-fe
    namespace: apache-doris
roleRef:
  kind: Role
  name: rc-apache-doris-fe
  apiGroup: ""

use env var in fe APP_NAMESPACE if your namespace is different to default, also, you have to put namespace in Role and RoleBinding