byzer-org / byzer-build

Apache License 2.0
7 stars 9 forks source link

K8S 镜像中,无法加载 libs 目录中的 plugin 文件 #45

Closed ZhengshuaiPENG closed 2 years ago

ZhengshuaiPENG commented 2 years ago

在 byzer-engine-deployment 的 yaml 中,将插件加入到 driver 和 executor 的启动 path 中,部署启动后却没有加载到相关的类

apiVersion: apps/v1
kind: Deployment
metadata:
  name: byzer-engine
  namespace: byzer
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app: byzer-engine
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: byzer-engine
    spec:
      serviceAccountName: spark
      imagePullSecrets:
        - name: dockerhub      
      containers:
        - name: byzer-engine
          image: byzer/byzer-lang-k8s:3.1.1-2.2.2
          imagePullPolicy: Always
          args:
            - echo "/work/spark-3.1.1-bin-hadoop3.2/bin/spark-submit --master k8s://$(CLUSTER_URL) --deploy-mode client --driver-memory 1024m --driver-cores 1 --executor-memory 1024m --executor-cores 1 --driver-library-path "local:///home/deploy/mlsql/libs/ansj_seg-5.1.6.jar:local:///home/deploy/mlsql/libs/nlp-lang-1.7.8.jar:local:///home/deploy/mlsql/libs/mlsql-assert-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-excel-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-ext-ets-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-shell-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-mllib-3.0_2.12-0.1.0-SNAPSHOT.jar" --class streaming.core.StreamingApp --conf spark.kubernetes.container.image=byzer/byzer-lang-k8s:3.1.1-2.2.2 --conf spark.kubernetes.container.image.pullPolicy=Always --conf spark.kubernetes.namespace=$(EXCUTOR_NAMESPACE) --conf spark.kubernetes.executor.request.cores=1 --conf spark.kubernetes.executor.limit.cores=1 --conf spark.executor.instances=1 --conf spark.driver.host=$(POD_IP) --conf spark.sql.cbo.enabled=true --conf spark.sql.adaptive.enabled=true --conf spark.sql.cbo.joinReorder.enabled=true --conf spark.sql.cbo.planStats.enabled=true --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.shuffleTracking.enabled=true --conf spark.dynamicAllocation.maxExecutors=$(MAX_EXECUTOR) --conf spark.sql.cbo.starSchemaDetection=true --conf spark.driver.maxResultSize=2g --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.kryoserializer.buffer.max=200m --conf spark.mlsql.auth.access_token= --conf "\"spark.executor.extraJavaOptions=-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:+UseContainerSupport -Dio.netty.tryReflectionSetAccessible=true\"" --conf "\"spark.driver.extraJavaOptions=-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:+UseContainerSupport -Dio.netty.tryReflectionSetAccessible=true\"" --conf "\"spark.executor.extraLibraryPath=local:///home/deploy/mlsql/libs/mlsql-assert-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-excel-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-ext-ets-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-shell-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-mllib-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/ansj_seg-5.1.6.jar:local:///home/deploy/mlsql/libs/nlp-lang-1.7.8.jar\"" --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf \"spark.kubernetes.file.upload.path=file:///byzer-demo/byzer-upload\" local:///home/deploy/mlsql/libs/streamingpro-mlsql-spark_3.0_2.12-2.2.2.jar     -streaming.plugin.clzznames tech.mlsql.plugins.ds.MLSQLExcelApp,tech.mlsql.plugins.assert.app.MLSQLAssert,tech.mlsql.plugins.shell.app.MLSQLShell,tech.mlsql.plugins.ext.ets.app.MLSQLETApp,tech.mlsql.plugins.mllib.app.MLSQLMllib -streaming.name byzer-engine -streaming.rest true -streaming.thrift false -streaming.platform spark -streaming.enableHiveSupport true -streaming.spark.service true -streaming.job.cancel true -streaming.driver.port 9003\" -streaming.datalake.path\" \"/byzer/admin\"  " | bash
          command:
            - /bin/sh
            - -c
          env:
            - name: CLUSTER_URL
              valueFrom:
                secretKeyRef:
                  name: byzer-engine-secret
                  key: CLUSTER_URL
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: EXCUTOR_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace          
            - name: MAX_EXECUTOR
              value: "5"                  
          resources:
            limits:
              cpu: "2"
              memory: 2Gi
            requests:
              cpu: "1"
              memory: 1Gi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - name: spark-conf
              mountPath: /work/spark-3.1.1-bin-hadoop3.2/conf
      volumes:
        - name: spark-conf
          configMap:
            name: byzer-engine-configmap
            items:
              - key: core-site-xml
                path: core-site.xml
      restartPolicy: Always
ZhengshuaiPENG commented 2 years ago

截取其中 driver 的 path 如下

--driver-library-path "local:///home/deploy/mlsql/libs/ansj_seg-5.1.6.jar:local:///home/deploy/mlsql/libs/nlp-lang-1.7.8.jar:local:///home/deploy/mlsql/libs/mlsql-assert-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-excel-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-ext-ets-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-shell-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-mllib-3.0_2.12-0.1.0-SNAPSHOT.jar"

executor 的 path 如下

spark.executor.extraLibraryPath=local:///home/deploy/mlsql/libs/mlsql-assert-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-excel-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-ext-ets-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-shell-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/mlsql-mllib-3.0_2.12-0.1.0-SNAPSHOT.jar:local:///home/deploy/mlsql/libs/ansj_seg-5.1.6.jar:local:///home/deploy/mlsql/libs/nlp-lang-1.7.8.jar

注册 extension 启动类的参数

-streaming.plugin.clzznames tech.mlsql.plugins.ds.MLSQLExcelApp,tech.mlsql.plugins.assert.app.MLSQLAssert,tech.mlsql.plugins.shell.app.MLSQLShell,tech.mlsql.plugins.ext.ets.app.MLSQLETApp,tech.mlsql.plugins.mllib.app.MLSQLMllib

ZhengshuaiPENG commented 2 years ago

estimate 1d

chncaesar commented 2 years ago

Changes in https://github.com/byzer-org/byzer-cicd/commit/452f593bc654dcc11a17677e1737ef33d4454c92