datainfrahq / druid-operator

Apache Druid On Kubernetes
Other
101 stars 42 forks source link

Rolling upgrades not done in proper order #156

Open layoaster opened 7 months ago

layoaster commented 7 months ago

After upgrading to v1.2.3 and having rollingDeploy=true, I see that rolling updates are no longer performed in the usual order:

  1. Historical
  2. MMs
  3. Broker
  4. Overlord
  5. Coordinator
  6. Router

Instead, the overlord and the historical instances are being updated at the same time and before the MM's.

Could you restore the previous order or adopt the recommended order.?

Note: Druid docs state that overlords can be updated before MMs if having "Autoscaling-based replacement". However this is only possible when deploying Druid on standalone EC2 instances.

EDIT: just to add Druid version: 28.0.1

AdheipSingh commented 6 months ago

in 1.2.3 we dint have any change in order of rolling deploy. Can you show the druid CR. and maybe some logs or screenshots.

AdheipSingh commented 6 months ago

@itamar-marom @cyril-corbon if anyone of you have faced this issue ?

layoaster commented 6 months ago

in 1.2.3 we dint have any change in order of rolling deploy. Can you show the druid CR. and maybe some logs or screenshots.

@AdheipSingh I just tested introducing a change to the common.runtime.properties and recorded the session. I could not upload it to Github because it's a MKV file but you can download it from here

My Druid CR:

apiVersion: druid.apache.org/v1alpha1
kind: Druid
metadata:
  name: analytics
spec:
  image: sample-image
  imagePullPolicy: Always
  imagePullSecrets:
    - name: docker-registry-credentials
  startScript: /druid.sh
  podLabels:
    environment: staging
    release: stable
  securityContext:
    fsGroup: 1000
    runAsUser: 1000
    runAsGroup: 1000
  rollingDeploy: true
  defaultProbes: false
  commonConfigMountPath: "/opt/druid/conf/druid/cluster/_common"
  jvm.options: |-
    -server
    -Duser.timezone=UTC
    -Dfile.encoding=UTF-8
    -Djava.io.tmpdir=/opt/druid/var/tmp/
    -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    -Dorg.jboss.logging.provider=slf4j
    -Dnet.spy.log.LoggerImpl=net.spy.memcached.compat.log.SLF4JLogger
    -Dlog4j.shutdownCallbackRegistry=org.apache.druid.common.config.Log4jShutdown
    -Dlog4j.shutdownHookEnabled=true
    -XX:HeapDumpPath=/opt/druid/var/historical.hprof
    -XX:+ExitOnOutOfMemoryError
    --add-exports=java.base/jdk.internal.ref=ALL-UNNAMED
    --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED
    --add-opens=java.base/java.lang=ALL-UNNAMED
    --add-opens=java.base/java.io=ALL-UNNAMED
    --add-opens=java.base/java.nio=ALL-UNNAMED
    --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED
    --add-opens=java.base/sun.nio.ch=ALL-UNNAMED
  log4j.config: |-
    <?xml version="1.0" encoding="UTF-8" ?>
    <Configuration status="WARN">
        <Appenders>
            <Console name="Console" target="SYSTEM_OUT">
                <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
            </Console>
        </Appenders>
        <Loggers>
            <Root level="info">
              <AppenderRef ref="Console"/>
            </Root>
            <Logger name="org.apache.druid.jetty.RequestLog" additivity="false" level="debug">
              <AppenderRef ref="Console"/>
            </Logger>
        </Loggers>
    </Configuration>
  common.runtime.properties: |

    # Zookeeper
    # https://druid.apache.org/docs/latest/tutorials/cluster.html#configure-zookeeper-connection
    # https://druid.apache.org/docs/latest/configuration/index.html#zookeeper
    druid.zk.service.host=druid-zookeeper.druid.svc
    druid.zk.paths.base=/druid
    druid.zk.service.compress=false

    # Metadata Store
    # https://druid.apache.org/docs/latest/configuration/index.html#metadata-storage
    druid.metadata.storage.type=postgresql
    druid.metadata.storage.connector.connectURI=jdbc:postgresql://druid-postgresql-cluster.druid.svc:5432/druid
    druid.metadata.storage.connector.user=druid
    druid.metadata.storage.connector.password={ "type": "environment", "variable": "METADATA_STORAGE_PASSWORD" }
    druid.metadata.storage.connector.createTables=true

    # Deep Storage
    # https://druid.apache.org/docs/latest/configuration/index.html#deep-storage
    druid.storage.type=s3
    druid.storage.bucket=sample-bucket
    druid.storage.baseKey=segments
    druid.storage.disableAcl=true
    druid.s3.accessKey={ "type": "environment", "variable": "AWS_ACCESS_KEY_ID" }
    druid.s3.secretKey={ "type": "environment", "variable": "AWS_SECRET_ACCESS_KEY" }

    # Extensions
    druid.extensions.loadList=["druid-basic-security", "postgresql-metadata-storage", "druid-kafka-indexing-service", "druid-s3-extensions", "druid-datasketches", "druid-lookups-cached-global", "druid-protobuf-extensions", "druid-parquet-extensions", "druid-distinctcount", "prometheus-emitter"]

    # Lookups
    # https://druid.apache.org/docs/latest/querying/lookups.html#saving-configuration-across-restarts
    druid.lookup.enableLookupSyncOnStartup=false

    # Logging
    # https://druid.apache.org/docs/latest/configuration/index.html#startup-logging
    druid.startup.logging.logProperties=true
    # Task Logging
    # https://druid.apache.org/docs/latest/configuration/index.html#task-logging
    druid.indexer.logs.type=s3
    druid.indexer.logs.s3Bucket=sample-bucket
    druid.indexer.logs.s3Prefix=tasks
    druid.indexer.logs.disableAcl=true
    # Query request logging
    # https://druid.apache.org/docs/28.0.1/configuration/#request-logging
    druid.request.logging.type=filtered
    # https://druid.apache.org/docs/28.0.1/configuration/#filtered-request-logging
    druid.request.logging.delegate.type=slf4j
    druid.request.logging.queryTimeThresholdMs=60000
    druid.request.logging.sqlQueryTimeThresholdMs=600000

    # Monitoring metrics
    # https://druid.apache.org/docs/latest/configuration/index.html#enabling-metrics
    druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.java.util.metrics.JvmCpuMonitor", "org.apache.druid.java.util.metrics.JvmThreadsMonitor"]
    druid.monitoring.emissionPeriod=PT10S
    # Metrics emitters
    # https://druid.apache.org/docs/latest/configuration/index.html#metrics-emitters
    druid.emitter=prometheus
    # Prometheus Emitter
    # https://druid.apache.org/docs/0.23.0/development/extensions-contrib/prometheus.html#configuration
    druid.emitter.prometheus.strategy=exporter
    druid.emitter.prometheus.port=9001
    druid.emitter.prometheus.namespace=druid_native
    druid.emitter.prometheus.addServiceAsLabel=true
    druid.emitter.prometheus.dimensionMapPath=/opt/druid/conf/druid/cluster/_common/metricsMapping.json

    # Cache
    druid.cache.type=caffeine

    # Security (Basic)
    # https://druid.apache.org/docs/latest/development/extensions-core/druid-basic-security.html
    # https://druid.apache.org/docs/latest/operations/security-overview.html#enable-an-authenticator
    # https://druid.apache.org/docs/latest/design/auth.html
    # Authenticator
    druid.auth.authenticatorChain=["MyBasicMetadataAuthenticator"]
    # MyBasicMetadataAuthenticator
    druid.auth.authenticator.MyBasicMetadataAuthenticator.type=basic
    druid.auth.authenticator.MyBasicMetadataAuthenticator.initialAdminPassword={ "type": "environment", "variable": "DRUID_ADMIN_PASSWORD" }
    druid.auth.authenticator.MyBasicMetadataAuthenticator.initialInternalClientPassword={ "type": "environment", "variable": "DRUID_INTERNAL_PASSWORD" }
    druid.auth.authenticator.MyBasicMetadataAuthenticator.credentialIterations=10000
    druid.auth.authenticator.MyBasicMetadataAuthenticator.credentialsValidator.type=metadata
    druid.auth.authenticator.MyBasicMetadataAuthenticator.skipOnFailure=false
    druid.auth.authenticator.MyBasicMetadataAuthenticator.authorizerName=MyBasicMetadataAuthorizer
    # Escalator
    druid.escalator.type=basic
    druid.escalator.internalClientUsername=druid_system
    druid.escalator.internalClientPassword={ "type": "environment", "variable": "DRUID_INTERNAL_PASSWORD" }
    druid.escalator.authorizerName=MyBasicMetadataAuthorizer
    # Authorizer
    druid.auth.authorizers=["MyBasicMetadataAuthorizer"]
    # MyBasicMetadataAuthorizer
    druid.auth.authorizer.MyBasicMetadataAuthorizer.type=basic
    druid.auth.authorizer.MyBasicMetadataAuthorizer.initialAdminUser=admin
    druid.auth.authorizer.MyBasicMetadataAuthorizer.initialAdminRole=admin

    # Query
    druid.generic.useThreeValueLogicForNativeFilters=true
    druid.expressions.useStrictBooleans=true
    druid.generic.useDefaultValueForNull=false

  extraCommonConfig:
    - name: druid-metrics-mapping
      namespace: druid

  volumeMounts:
    - mountPath: /opt/druid/var
      name: var-volume
  volumes:
    - name: var-volume
      emptyDir: {}

  env:
    - name: POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          fieldPath: metadata.namespace
  envFrom:
    - secretRef:
        name: druid-credentials
    - secretRef:
        name: druid-kafka-credentials
    - secretRef:
        name: druid-s3-credentials

  tolerations:
    - key: node-role.kubernetes.io/master
      operator: Exists
      effect: NoSchedule

  nodes:
    ############################## Druid Master #################################
    overlords:
      kind: StatefulSet
      nodeType: "overlord"
      podLabels:
        druid-process: overlord
      druid.port: 8090
      # Requires this mount path due to Druid's start script design
      # https://github.com/druid-io/druid-operator/issues/25
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/master/coordinator-overlord"
      replicas: 1
      podDisruptionBudgetSpec:
        maxUnavailable: 1
      extra.jvm.options: |-
        -Xms2G
        -Xmx2G
      runtime.properties: |
        # https://druid.apache.org/docs/latest/configuration/index.html#overlord
        druid.service=druid/overlord
        druid.plaintextPort=8090

        # https://druid.apache.org/docs/latest/configuration/index.html#overlord-operations
        druid.indexer.runner.type=httpRemote
        druid.indexer.storage.type=metadata
        druid.indexer.storage.recentlyFinishedThreshold=PT12H
        druid.indexer.queue.startDelay=PT30S

        # Monitoring
        druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.java.util.metrics.JvmCpuMonitor", "org.apache.druid.java.util.metrics.JvmThreadsMonitor", "org.apache.druid.server.metrics.TaskCountStatsMonitor", "org.apache.druid.server.metrics.TaskSlotCountStatsMonitor"]

        ## Tasks Metadata/Logs Management
        ## https://druid.apache.org/docs/latest/operations/clean-metadata-store/#indexer-task-logs
        # Cleanup of task logs and its associated metadata
        druid.indexer.logs.kill.enabled=true
        # 12 hours in milliseconds
        druid.indexer.logs.kill.durationToRetain=43200000
        # 5 min in milliseconds
        druid.indexer.logs.kill.initialDelay=300000
        # 6 hours in milliseconds
        druid.indexer.logs.kill.delay=21600000
      resources:
        requests:
          cpu: 1500m
          memory: 6Gi
        limits:
          cpu: 2
          memory: 10Gi
      livenessProbe:
        httpGet:
          path: /status/health
          port: 8090
        initialDelaySeconds: 15
        periodSeconds: 10
        timeoutSeconds: 5
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /status/health
          port: 8090
        initialDelaySeconds: 15
        periodSeconds: 10
        timeoutSeconds: 5
        failureThreshold: 3
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-type
                    operator: In
                    values:
                      - druid-master
      tolerations:
        - key: druid
          value: master
          operator: Equal
          effect: NoSchedule
      ports:
        - name: metrics
          containerPort: 9001
      services:
        - spec:
            type: ClusterIP
            ports:
              - name: service
                port: 8090
              - name: metrics
                port: 9001
                targetPort: metrics

    coordinators:
      kind: StatefulSet
      nodeType: "coordinator"
      podLabels:
        druid-process: coordinator
      druid.port: 8081
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/master/coordinator-overlord"
      replicas: 1
      podDisruptionBudgetSpec:
        maxUnavailable: 1
      extra.jvm.options: |-
        -Xms4G
        -Xmx4G
      runtime.properties: |
        # https://druid.apache.org/docs/latest/configuration/index.html#coordinator
        druid.service=druid/coordinator
        druid.plaintextPort=8081

        # https://druid.apache.org/docs/latest/configuration/index.html#coordinator-operation
        druid.coordinator.period=PT60S
        druid.coordinator.startDelay=PT300S
        druid.coordinator.period.indexingPeriod=PT600S

        # Coordinator's Compaction duty
        druid.coordinator.dutyGroups=["compaction"]
        druid.coordinator.compaction.duties=["compactSegments"]
        druid.coordinator.compaction.period=PT120S

        ## Metadata Management
        ## https://druid.apache.org/docs/latest/operations/clean-metadata-store/#configure-automated-metadata-cleanup
        druid.coordinator.period.metadataStoreManagementPeriod=PT1H

        # Cleanup unused segments older than 3 months
        druid.coordinator.kill.on=true
        druid.coordinator.kill.period=P1D
        druid.coordinator.kill.durationToRetain=P90D
        druid.coordinator.kill.maxSegments=1000

        # Cleanup audit records older than 1 month
        druid.coordinator.kill.audit.on=true
        druid.coordinator.kill.audit.period=P1D
        druid.coordinator.kill.audit.durationToRetain=P30D

        # Cleanup supervisors records older than 1 month
        druid.coordinator.kill.supervisor.on=true
        druid.coordinator.kill.supervisor.period=P1D
        druid.coordinator.kill.supervisor.durationToRetain=P30D

        # Cleanup rules records older than 1 day
        druid.coordinator.kill.rule.on=true
        druid.coordinator.kill.rule.period=P1D
        druid.coordinator.kill.rule.durationToRetain=P1D

        # Cleanup auto-compaction configuration records on a daily basis
        # only applies to datasources with no segments (used or unused)
        druid.coordinator.kill.compaction.on=true
        druid.coordinator.kill.compaction.period=P1D

        # Cleanup supervisors' datasource records older than 7 days
        # only applies when the supervisor has been terminated
        druid.coordinator.kill.datasource.on=true
        druid.coordinator.kill.datasource.period=P1D
        druid.coordinator.kill.datasource.durationToRetain=P7D
      resources:
        requests:
          cpu: 1500m
          memory: 4Gi
        limits:
          cpu: 2
          memory: 6Gi
      livenessProbe:
        httpGet:
          path: /status/health
          port: 8081
        initialDelaySeconds: 15
        periodSeconds: 10
        timeoutSeconds: 5
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /status/health
          port: 8081
        initialDelaySeconds: 15
        periodSeconds: 10
        timeoutSeconds: 5
        failureThreshold: 3
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-type
                    operator: In
                    values:
                      - druid-master
      tolerations:
        - key: druid
          value: master
          operator: Equal
          effect: NoSchedule
      ports:
        - name: metrics
          containerPort: 9001
      services:
        - spec:
            type: ClusterIP
            ports:
              - name: service
                port: 8081
              - name: metrics
                port: 9001
                targetPort: metrics

    ############################## Druid Data #################################
    historicals:
      kind: StatefulSet
      nodeType: "historical"
      podLabels:
        druid-process: historical
      druid.port: 8083
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/data/historical"
      replicas: 1
      podDisruptionBudgetSpec:
        maxUnavailable: 1
      extra.jvm.options: |-
        -Xms5G
        -Xmx5G
        -XX:MaxDirectMemorySize=6G
      runtime.properties: |
        # https://druid.apache.org/docs/latest/configuration/index.html#historical
        druid.service=druid/historical
        druid.plaintextPort=8083

        #  HTTP server
        # Sum of `druid.broker.http.numConnections` accross all the brokers in the cluster
        druid.server.http.numThreads=70

        # Processing threads and buffers
        druid.processing.buffer.sizeBytes=500M
        druid.processing.numMergeBuffers=4
        druid.processing.numThreads=7

        # Segment storage
        # https://druid.apache.org/docs/latest/configuration/index.html#historical-general-configuration
        druid.server.maxSize=500G
        # https://druid.apache.org/docs/latest/configuration/index.html#storing-segments
        druid.segmentCache.locations=[{\"path\":\"/druid/data/segments\",\"maxSize\":"500G"}]

        # Segment loading
        druid.segmentCache.numLoadingThreads=2

        # Query cache
        druid.historical.cache.useCache=true
        druid.historical.cache.populateCache=true
        druid.cache.sizeInBytes=1G

        # Monitoring
        druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.java.util.metrics.JvmCpuMonitor", "org.apache.druid.java.util.metrics.JvmThreadsMonitor", "org.apache.druid.client.cache.CacheMonitor", "org.apache.druid.server.metrics.HistoricalMetricsMonitor", "org.apache.druid.server.metrics.QueryCountStatsMonitor"]

        ## Query performance
        # Default timeout (60 seconds) can be overriden at query context
        druid.server.http.defaultQueryTimeout=60000
        # GroupBy merging buffer per-query spilling to disk (1Gb)
        druid.query.groupBy.maxOnDiskStorage=1000000000
      resources:
        requests:
          cpu: 7
          memory: 11Gi
        limits:
          cpu: 8
          # 19GB for mapping segments to memory
          memory: 30Gi
      volumeClaimTemplates:
        - metadata:
            name: data-volume
          spec:
            storageClassName: gp3
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 500Gi
      volumeMounts:
        - name: data-volume
          mountPath: /druid/data
      livenessProbe:
        httpGet:
          path: /status/health
          port: 8083
        initialDelaySeconds: 20
        periodSeconds: 20
        timeoutSeconds: 5
        # 10 minutes
        failureThreshold: 30
      readinessProbe:
        httpGet:
          path: /druid/historical/v1/readiness
          port: 8083
        initialDelaySeconds: 20
        periodSeconds: 30
        timeoutSeconds: 5
        # 100 minutes
        failureThreshold: 200
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-type
                    operator: In
                    values:
                      - druid-data
      tolerations:
        - key: druid
          value: data
          operator: Equal
          effect: NoSchedule
      ports:
        - name: metrics
          containerPort: 9001
      services:
        - spec:
            type: ClusterIP
            ports:
              - name: service
                port: 8083
              - name: metrics
                port: 9001
                targetPort: metrics

    middlemanagers:
      kind: StatefulSet
      nodeType: "middleManager"
      podLabels:
        druid-process: middleManager
      druid.port: 8091
      # Requires this mount path due to Druid's start script design
      # https://github.com/druid-io/druid-operator/issues/25
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/data/middleManager"
      replicas: 1
      podDisruptionBudgetSpec:
        maxUnavailable: 1
      extra.jvm.options: |-
        -Xms256M
        -Xmx256M
      runtime.properties: |-
        # https://druid.apache.org/docs/latest/configuration/index.html#middlemanager-and-peons
        druid.service=druid/middleManager
        druid.plaintextPort=8091

        # https://druid.apache.org/docs/latest/configuration/index.html#middlemanager-configuration
        druid.indexer.runner.javaOptsArray=["-server", "-Xms2200M", "-Xmx2200M", "-XX:MaxDirectMemorySize=1800M", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", "-Djava.io.tmpdir=var/data/tmp/peons", "-XX:+ExitOnOutOfMemoryError", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/druid/data/peon.%t.%p.hprof", "--add-exports=java.base/jdk.internal.ref=ALL-UNNAMED", "--add-exports=java.base/jdk.internal.misc=ALL-UNNAMED", "--add-opens=java.base/java.lang=ALL-UNNAMED", "--add-opens=java.base/java.io=ALL-UNNAMED", "--add-opens=java.base/java.nio=ALL-UNNAMED", "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED", "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED"]
        druid.worker.capacity=14

        # The middle managers processes open ports from inside the Pod
        # This forces us to open them from outside the Pod to allow communication to them
        # Please, be careful with this
        druid.indexer.runner.ports=[8100, 8101, 8102, 8103, 8104, 8105, 8106, 8107, 8108, 8109, 8110, 8111, 8112, 8113]

        # HTTP server
        # https://druid.apache.org/docs/latest/configuration/index.html#indexer-concurrent-requests
        # Sum of `druid.broker.http.numConnections` accross all the brokers in the cluster
        druid.server.http.numThreads=70

        # Monitoring
        druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.java.util.metrics.JvmCpuMonitor", "org.apache.druid.java.util.metrics.JvmThreadsMonitor", "org.apache.druid.client.cache.CacheMonitor"]

        ## Query Performance
        # GroupBy merging buffer per-query spilling to disk (1GB)
        druid.query.groupBy.maxOnDiskStorage=1000000000

        # Query cache
        druid.realtime.cache.useCache=true
        druid.realtime.cache.populateCache=true
        druid.cache.sizeInBytes=200Mi

        # Additional Peons config:
        # https://druid.apache.org/docs/latest/configuration/index.html#middlemanager-configuration
        druid.indexer.task.baseTaskDir=/druid/data/persistent/task
        # Processing threads and buffers on Peons
        druid.indexer.fork.property.druid.processing.numThreads=4
        druid.indexer.fork.property.druid.processing.numMergeBuffers=4
        druid.indexer.fork.property.druid.processing.buffer.sizeBytes=200MiB
        # Monitoring
        druid.indexer.fork.property.druid.emitter.prometheus.strategy=pushgateway
        druid.indexer.fork.property.druid.emitter.prometheus.pushGatewayAddress=http://prometheus-pushgateway.kube-prometheus-stack.svc.cluster.local:9091
      resources:
        requests:
          cpu: "15"
          memory: 57G
        limits:
          cpu: "15.9"
          memory: 58G
      volumes:
        - name: data-volume
          emptyDir: {}
      volumeMounts:
        - name: data-volume
          mountPath: /druid/data
      livenessProbe:
        httpGet:
          path: /status/health
          port: 8091
        initialDelaySeconds: 5
        periodSeconds: 10
        timeoutSeconds: 5
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /status/health
          port: 8091
        initialDelaySeconds: 5
        periodSeconds: 10
        timeoutSeconds: 5
        failureThreshold: 3
      terminationGracePeriodSeconds: 1200
      lifecycle:
        preStop:
          exec:
            command:
              - "/bin/sh"
              - "/opt/druid/resources/scripts/mm_shutdown_hook.sh"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-type
                    operator: In
                    values:
                      - druid-data-mm
      tolerations:
        - key: druid
          value: data-mm
          operator: Equal
          effect: NoSchedule
      ports:
        - name: metrics
          containerPort: 9001
        # The middle managers processes open ports from inside the Pod
        # This forces us to open them from outside the pod to allow communication to them
        # They are configured on the 'middlemanagers.runtime.properties.druid.indexer.runner.ports'
        # Please be careful with this
        - name: peon-0
          containerPort: 8100
        - name: peon-1
          containerPort: 8101
        - name: peon-2
          containerPort: 8102
        - name: peon-3
          containerPort: 8103
        - name: peon-4
          containerPort: 8104
        - name: peon-5
          containerPort: 8105
        - name: peon-6
          containerPort: 8106
        - name: peon-7
          containerPort: 8107
        - name: peon-8
          containerPort: 8108
        - name: peon-9
          containerPort: 8109
        - name: peon-10
          containerPort: 8110
        - name: peon-11
          containerPort: 8111
        - name: peon-12
          containerPort: 8112
        - name: peon-13
          containerPort: 8113
      services:
        - spec:
            type: ClusterIP
            ports:
              - name: service
                port: 8091
              - name: metrics
                port: 9001
                targetPort: metrics

    ############################## Druid Query #################################
    brokers:
      kind: StatefulSet
      nodeType: "broker"
      podLabels:
        druid-process: broker
      druid.port: 8082
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/query/broker"
      replicas: 3
      podDisruptionBudgetSpec:
        maxUnavailable: 1
      extra.jvm.options: |-
        -Xms8G
        -Xmx8G
        -XX:MaxDirectMemorySize=5g
      runtime.properties: |
        # https://druid.apache.org/docs/latest/configuration/index.html#broker
        druid.service=druid/broker
        druid.plaintextPort=8082

        # HTTP server
        druid.server.http.numThreads=30

        # HTTP client
        druid.broker.http.numConnections=20
        druid.broker.http.maxQueuedBytes=20MiB
        druid.broker.http.readTimeout=PT5M
        # +/- 90% of druid.broker.http.readTimeout
        druid.broker.http.unusedConnectionTimeout=PT4M

        # Processing threads and buffers
        druid.processing.buffer.sizeBytes=1G
        druid.processing.numMergeBuffers=4
        druid.processing.numThreads=1
        druid.processing.tmpDir=/druid/data/processing

        # Query cache disabled -- push down caching and merging instead
        druid.broker.cache.useCache=false
        druid.broker.cache.populateCache=false

        # Monitoring
        druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.java.util.metrics.JvmCpuMonitor", "org.apache.druid.java.util.metrics.JvmThreadsMonitor", "org.apache.druid.client.cache.CacheMonitor", "org.apache.druid.server.metrics.QueryCountStatsMonitor"]

        # SQL settings
        druid.sql.enable=true
        druid.sql.planner.useNativeQueryExplain=true

        ## Query performance
        # Default timeout (60 seconds) can be overriden at query context
        druid.server.http.defaultQueryTimeout=60000
        # GroupBy merging buffer per-query spilling to disk (1Gb)
        druid.query.groupBy.maxOnDiskStorage=1000000000
        # Subqueries
        druid.server.http.maxSubqueryRows=800000
      resources:
        requests:
          cpu: "3.5"
          memory: 13Gi
        limits:
          cpu: "6"
          #+1GB or 2GB overhead to allow usage spikes
          memory: 14Gi
      volumes:
        - name: data-volume
          emptyDir: {}
      volumeMounts:
        - name: data-volume
          mountPath: /druid/data
      livenessProbe:
        httpGet:
          path: /status/health
          port: 8082
        initialDelaySeconds: 20
        periodSeconds: 10
        timeoutSeconds: 5
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /druid/broker/v1/readiness
          port: 8082
        initialDelaySeconds: 20
        periodSeconds: 10
        timeoutSeconds: 5
        failureThreshold: 20
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-type
                    operator: In
                    values:
                      - druid-query
      tolerations:
        - key: druid
          value: query
          operator: Equal
          effect: NoSchedule
      ports:
        - name: metrics
          containerPort: 9001
      services:
        - spec:
            type: ClusterIP
            ports:
              - name: service
                port: 8082
              - name: metrics
                port: 9001
                targetPort: metrics

    routers:
      kind: StatefulSet
      nodeType: "router"
      podLabels:
        druid-process: router
      druid.port: 8888
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/query/router"
      replicas: 1
      podDisruptionBudgetSpec:
        maxUnavailable: 1
      extra.jvm.options: |-
        -Xms4G
        -Xmx4G
      runtime.properties: |
        # https://druid.apache.org/docs/latest/configuration/index.html#router
        druid.service=druid/router
        druid.plaintextPort=8888

        # https://druid.apache.org/docs/latest/configuration/index.html#runtime-configuration
        # Service discovery
        druid.router.defaultBrokerServiceName=druid/broker
        # HTTP server
        druid.router.http.numConnections=50
        druid.router.http.numMaxThreads=80
        druid.router.http.readTimeout=PT5M

        # Management proxy to coordinator/overlord: required for unified web console.
        # https://druid.apache.org/docs/latest/design/router.html#router-as-management-proxy
        druid.router.managementProxy.enabled=true
      resources:
        requests:
          cpu: "3.5"
          memory: 4Gi
        limits:
          cpu: "4"
          memory: 5Gi
      livenessProbe:
        httpGet:
          path: /status/health
          port: 8888
        initialDelaySeconds: 15
        periodSeconds: 10
        timeoutSeconds: 5
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /status/health
          port: 8888
        initialDelaySeconds: 15
        periodSeconds: 10
        timeoutSeconds: 5
        failureThreshold: 3
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-type
                    operator: In
                    values:
                      - druid-query
      tolerations:
        - key: druid
          value: query
          operator: Equal
          effect: NoSchedule
      ports:
        - name: metrics
          containerPort: 9001
      services:
        - spec:
            type: ClusterIP
            ports:
              - name: http
                port: 80
                targetPort: 8888
              - name: metrics
                port: 9001
                targetPort: metrics
AdheipSingh commented 6 months ago

config LGTM. Sadly i haven't seen this issue. Any logs or screens would be helpful.

layoaster commented 6 months ago

@AdheipSingh I provided a Google Drive link to a video showing the pod update processing. Do you have issues watching it? or isn't enough?

https://drive.google.com/file/d/15GxhZZZWlhWiz-EXXAIarG81jMNMmu49/view?usp=sharing

plutocholia commented 6 months ago

I did face the same issue in the upgrade process of Druid, from version 28.0.1 to 29.0.1

Operator version: v1.2.3 Kubernetes version: v1.28.8

The screenshot shows Druid pods sorted by the creation Timestamp, and as layoaster said, the Overlords and the Historicals are being updated at the same time before the Middle Managers!

image

@itamar-marom @cyril-corbon if anyone of you have faced this issue ?