druid-io / druid-operator

Druid Kubernetes Operator
Other
205 stars 93 forks source link

Cannot write segments to S3 storage #262

Closed vietanhduong closed 2 years ago

vietanhduong commented 2 years ago

Hi guys,

I'm following the cluster example. Everything is working, but when I check my bucket, then there is nothing in there. I'm using Kafka to stream data to Druid.

This is my config

# This spec only works on a single node kubernetes cluster(e.g. typical k8s cluster setup for dev using kind/minikube or single node AWS EKS cluster etc)
# as it uses local disk as "deep storage".
#
apiVersion: "druid.apache.org/v1alpha1"
kind: "Druid"
metadata:
  name: tiny-cluster
spec:
  image: apache/druid:0.19.0
  # Optionally specify image for all nodes. Can be specify on nodes also
  # imagePullSecrets:
  # - name: tutu
  startScript: /druid.sh
  podLabels:
    environment: stage
    release: alpha
  podAnnotations:
    vault.hashicorp.com/agent-inject: "true"
    vault.hashicorp.com/role: "druid"
    vault.hashicorp.com/agent-inject-status: "update"
    vault.hashicorp.com/agent-inject-secret-credentials: "data/druid/aws"
    vault.hashicorp.com/secret-volume-path-credentials: "/opt/druid/.aws"
    vault.hashicorp.com/agent-inject-template-credentials: |
      {{- with secret "data/druid/aws" -}}
      [default]
      aws_access_key_id = {{ .Data.data.access_key }}
      aws_secret_access_key = {{ .Data.data.secret_key }}
      {{- end }}
  readinessProbe:
    httpGet:
      path: /status/health
      port: 8088
  securityContext:
    fsGroup: 1000
    runAsUser: 1000
    runAsGroup: 1000
  services:
    - spec:
        type: ClusterIP
  commonConfigMountPath: "/opt/druid/conf/druid/cluster/_common"
  jvm.options: |-
    -server
    -XX:MaxDirectMemorySize=10240g
    -Duser.timezone=UTC
    -Dfile.encoding=UTF-8
    -Dlog4j.debug
    -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    -Djava.io.tmpdir=/druid/data
  log4j.config: |-
    <?xml version="1.0" encoding="UTF-8" ?>
    <Configuration status="WARN">
        <Appenders>
            <Console name="Console" target="SYSTEM_OUT">
                <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
            </Console>
        </Appenders>
        <Loggers>
            <Root level="info">
                <AppenderRef ref="Console"/>
            </Root>
            <Root level="error">
              <AppenderRef ref="Console"/>
            </Root>
        </Loggers>
    </Configuration>
  common.runtime.properties: |
    #
    # Logging
    #

    # Log all runtime properties on startup. Disable to avoid logging properties on startup:
    druid.startup.logging.logProperties=true
    druid.request.logging.type=slf4j
    druid.request.logging.feed=requests

    # Zookeeper
    druid.zk.service.host=zookeeper.druid.svc.cluster.local
    druid.zk.paths.base=/druid
    druid.zk.service.compress=false

    # Metadata Store
    druid.metadata.storage.type=postgresql
    druid.metadata.storage.connector.connectURI=jdbc:postgresql://rds_host:5432/druid
    druid.metadata.storage.connector.user=username
    druid.metadata.storage.connector.password=password

    # Deep Storage
    druid.storage.type=s3
    druid.storage.bucket=bucket_name
    druid.storage.baseKey=druid/segments
    druid.storage.sse.type=s3

    #
    # Extensions
    #
    druid.extensions.loadList=["druid-kafka-indexing-service", "postgresql-metadata-storage", "druid-s3-extensions"]

    #
    # Service discovery
    #
    druid.selectors.indexing.serviceName=druid/overlord
    druid.selectors.coordinator.serviceName=druid/coordinator

    druid.indexer.logs.type=s3
    druid.indexer.logs.s3Bucket=bucket_name
    druid.indexer.logs.s3Prefix=druid/indexing-logs

    #
    # Security
    #
    druid.server.hiddenProperties=["druid.s3.accessKey","druid.s3.secretKey","druid.metadata.storage.connector.password"]

    #
    # SQL
    #
    druid.sql.enable=true

    druid.lookup.enableLookupSyncOnStartup=false
  volumeMounts:
    - mountPath: /druid/data
      name: data-volume
    - mountPath: /druid/deepstorage
      name: deepstorage-volume
  volumes:
    - name: data-volume
      emptyDir: {}
    - name: deepstorage-volume
      hostPath:
        path: /tmp/druid/deepstorage
        type: DirectoryOrCreate
  env:
    - name: POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          fieldPath: metadata.namespace
    - name: AWS_REGION
      value: ap-southeast-1

  nodes:
    brokers:
      # Optionally specify for running broker as Deployment
      # kind: Deployment
      nodeType: "broker"
      # Optionally specify for broker nodes
      # imagePullSecrets:
      # - name: tutu
      druid.port: 8088
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/query/broker"
      replicas: 1
      runtime.properties: |
        druid.service=druid/broker

        # HTTP server threads
        druid.broker.http.numConnections=5
        druid.server.http.numThreads=10

        # Processing threads and buffers
        druid.processing.buffer.sizeBytes=1
        druid.processing.numMergeBuffers=1
        druid.processing.numThreads=1
        druid.sql.enable=true
      extra.jvm.options: |-
        -Xmx512M
        -Xms512M
      hpAutoscaler:
        maxReplicas: 10
        minReplicas: 1
        scaleTargetRef:
          apiVersion: apps/v1
          kind: StatefulSet
          name: druid-tiny-cluster-brokers
        metrics:
          - type: Resource
            resource:
              name: cpu
              target:
                type: Utilization
                averageUtilization: 50

    coordinators:
      # Optionally specify for running coordinator as Deployment
      # kind: Deployment
      nodeType: "coordinator"
      druid.port: 8088
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/master/coordinator-overlord"
      replicas: 1
      runtime.properties: |
        druid.service=druid/coordinator

        # HTTP server threads
        druid.coordinator.startDelay=PT30S
        druid.coordinator.period=PT30S

        # Configure this coordinator to also run as Overlord
        druid.coordinator.asOverlord.enabled=true
        druid.coordinator.asOverlord.overlordService=druid/overlord
        druid.indexer.queue.startDelay=PT30S
        druid.indexer.runner.type=local
      extra.jvm.options: |-
        -Xmx512M
        -Xms512M

    historicals:
      nodeType: "historical"
      druid.port: 8088
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/data/historical"
      replicas: 1
      runtime.properties: |
        druid.service=druid/historical
        druid.server.http.numThreads=5
        druid.processing.buffer.sizeBytes=536870912
        druid.processing.numMergeBuffers=1
        druid.processing.numThreads=1
        druid.processing.tmpDir=var/druid/processing
        # Segment storage
        druid.segmentCache.locations=[{\"path\":\"var/druid/data/segment-cache\",\"maxSize\":10737418240}]
        druid.server.maxSize=10737418240
      extra.jvm.options: |-
        -Xmx512M
        -Xms512M

    routers:
      nodeType: "router"
      druid.port: 8088
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/query/router"
      replicas: 1
      runtime.properties: |
        druid.service=druid/router

        # HTTP proxy
        druid.router.http.numConnections=10
        druid.router.http.readTimeout=PT5M
        druid.router.http.numMaxThreads=10
        druid.server.http.numThreads=10

        # Service discovery
        druid.router.defaultBrokerServiceName=druid/broker
        druid.router.coordinatorServiceName=druid/coordinator

        # Management proxy to coordinator / overlord: required for unified web console.
        druid.router.managementProxy.enabled=true
      extra.jvm.options: |-
        -Xmx512M
        -Xms512M

    middlemanagers:
      nodeType: middleManager
      druid.port: 8088
      nodeConfigMountPath: /opt/druid/conf/druid/cluster/data/middleManager
      replicas: 1
      runtime.properties: |
        druid.service=/opt/druid/data/middlemanager/task
        # Number of tasks per middleManager
        druid.worker.capacity=4

        # Task launch parameters
        druid.indexer.runner.javaOpts=-server -Xms1g -Xmx1g -XX:MaxDirectMemorySize=1g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
        druid.indexer.task.baseTaskDir=var/druid/task

        # HTTP server threads
        druid.server.http.numThreads=60

        # Processing threads and buffers on Peons
        druid.indexer.fork.property.druid.processing.numMergeBuffers=2
        druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100MiB
        druid.indexer.fork.property.druid.processing.numThreads=1
      extra.jvm.options: |-
        -Xmx4G
        -Xms4G
      resources:
        requests:
          cpu: 2
          memory: 5Gi
        limits:
          cpu: 2
          memory: 5Gi
      hpAutoscaler:
        maxReplicas: 10
        minReplicas: 1
        scaleTargetRef:
          apiVersion: apps/v1
          kind: StatefulSet
          name: druid-tiny-cluster-middlemanagers
        metrics:
          - type: Resource
            resource:
              name: cpu
              target:
                type: Utilization
                averageUtilization: 50

I'm using Vault to create an AWS credentials at /opt/druid/.aws/credentials. The result after rendered

[default]
aws_access_key_id = AAAAAAAAAAAAAAAAAaa
aws_secret_access_key = BBBBBBBBBBBBBBBBBBBBBBb
vietanhduong commented 2 years ago

Actually, this is my fault. It's related about row per segment config :)