Closed EmiPhil closed 4 years ago
are you using the latest code in master ? I am running with 0.16 image it works great. Your config seems to be fine apart from autoscaler
which has been changed to hpAutoscaler
.
I have a yaml which is running in one of my env. Do have look at it, it runs perfect.
https://gist.github.com/AdheipSingh/cee71aecc1a0cd0f6ccf4cbcb324fc4d
Since broker is not able to look into coordinator, can you confirm the k8s endpoints are up for coordinator. Can you check if your zookeeper is running fine ?
I can see the service being created and pointing to the correct pod at port 8081 for the coordinator. I think the underlying issue is that it is using the config in /conf/druid/cluster/master/coordinator-overlord/runtime.properties
instead of /conf/druid/cluster/master/coordinator/runtime.properties
.
If I log into the coordinator server and look at the configs:
/opt/apache-druid-0.17.0 $ cat conf/zk/zoo.cfg
#
# Server
#
tickTime=2000
dataDir=var/zk
clientPort=2181
initLimit=5
syncLimit=2
#
# Autopurge
#
autopurge.snapRetainCount=5
autopurge.purgeInterval=1
/opt/apache-druid-0.17.0 $ cat conf/druid/cluster/master/coordinator-overlord/runtime.properties
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
druid.service=druid/coordinator
druid.plaintextPort=8081
druid.coordinator.startDelay=PT10S
druid.coordinator.period=PT5S
# Run the overlord service in the coordinator process
druid.coordinator.asOverlord.enabled=true
druid.coordinator.asOverlord.overlordService=druid/overlord
druid.indexer.queue.startDelay=PT5S
druid.indexer.runner.type=remote
druid.indexer.storage.type=metadata
/opt/apache-druid-0.17.0 $ cat conf/druid/cluster/master/coordinator/runtime.properties
druid.port=8081
druid.service=druid/coordinator
druid.coordinator.startDelay=PT30S
druid.coordinator.period=PT30S
druid.coordinator.kill.on=true
druid.coordinator.kill.period=PT2H
druid.coordinator.kill.durationToRetain=PT0s
druid.coordinator.kill.maxSegments=5000
What should I look for to confirm zookeeper health?
yes keep the mount path to /conf/druid/cluster/master/coordinator/runtime.properties
. It should work then
Since service discovery is through zookeeper. Just to check the logs, i faced issues when i was recreating stacks for testing....better to purge zookeeper each time when testing new configs.
I think that druid.sh is choosing the wrong runtime.properties
:
/opt/apache-druid-0.17.0 $ cat /druid.sh
#!/bin/sh
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# NOTE: this is a 'run' script for the stock tarball
# It takes 1 required argument (the name of the service,
# e.g. 'broker', 'historical' etc). Any additional arguments
# are passed to that service.
#
# It accepts 'JAVA_OPTS' as an environment variable
#
# Additional env vars:
# - DRUID_LOG4J -- set the entire log4j.xml verbatim
# - DRUID_LOG_LEVEL -- override the default log level in default log4j
# - DRUID_XMX -- set Java Xmx
# - DRUID_XMS -- set Java Xms
# - DRUID_MAXNEWSIZE -- set Java max new size
# - DRUID_NEWSIZE -- set Java new size
# - DRUID_MAXDIRECTMEMORYSIZE -- set Java max direct memory size
#
# - DRUID_CONFIG -- full path to a file for druid 'common' properties
# - DRUID_CONFIG_${service} -- full path to a file for druid 'service' properties
set -e
SERVICE="$1"
echo "$(date -Is) startup service $SERVICE"
# We put all the config in /tmp/conf to allow for a
# read-only root filesystem
mkdir -p /tmp/conf/
cp -r /opt/druid/conf/druid /tmp/conf/druid
getConfPath() {
cluster_conf_base=/tmp/conf/druid/cluster
case "$1" in
_common) echo $cluster_conf_base/_common ;;
historical) echo $cluster_conf_base/data/historical ;;
middleManager) echo $cluster_conf_base/data/middleManager ;;
coordinator | overlord) echo $cluster_conf_base/master/coordinator-overlord ;;
broker) echo $cluster_conf_base/query/broker ;;
router) echo $cluster_conf_base/query/router ;;
esac
}
COMMON_CONF_DIR=$(getConfPath _common)
SERVICE_CONF_DIR=$(getConfPath ${SERVICE})
[...]
I tried to change the env variable DRUIDCONFIG${service} but it seems to have no effect.
As far as I can tell, it isn't even getting to the point where it would try to talk to zookeeper because it's using the wrong runtime.properties
.
did you try with by changing the mount path to /conf/druid/cluster/master/coordinator/runtime.properties
. The druid.sh is the same what i am using. Did you try the tiny-cluster.yaml in examples.
@EmiPhil haven't had a chance to look in detail but noticed....
druid.zk.service.host=zk-zookeeper-headless.default.svc.cluster.local
is that the name of headless service covering all zookeeper pods or you have just one zookeeper pod in the quorum ? If you have multiple zookeeper pods, You need to explicitly provide list of all zookeeper pods and not name of the headless service.
@AdheipSingh
Yeah the nodeConfigMountPath
has always been /conf/druid/cluster/master/coordinator
. When I log into the pod I can see the configuration is being put into that folder, but there is also another runtime.properties
files in /conf/druid/cluster/master/coordinator-overlord
and druid seems to be preferring to use that configuration.
@himanshug
Yep I have multiple. Fixed that in this new config, but the issue is still occuring.
Latest configuration:
apiVersion: druid.apache.org/v1alpha1
kind: Druid
metadata:
name: cluster
spec:
image: apache/druid:0.17.0
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /secrets/GOOGLE_APPLICATION_CREDENTIALS
startScript: /druid.sh
securityContext:
fsGroup: 1000
runAsUser: 1000
runAsGroup: 1000
services:
- spec:
type: ClusterIP
clusterIP: None
commonConfigMountPath: "/opt/druid/conf/druid/cluster/_common"
jvm.options: |-
-server
-XX:+PrintFlagsFinal
-XX:MaxDirectMemorySize=10240g
-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Dlog4j.debug
-XX:+ExitOnOutOfMemoryError
-XX:HeapDumpPath=/druid/data/logs
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseG1GC
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=50
-XX:GCLogFileSize=50m
-Xloggc:/druid/data/logs/gc.log
common.runtime.properties: |
#
# Monitoring
#
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
#druid.emitter=noop
druid.emitter.logging.logLevel=debug
#
# Extensions
#
druid.extensions.loadList=["druid-google-extensions","druid-kafka-indexing-service","druid-datasketches","postgresql-metadata-storage","druid-protobuf-extensions","druid-stats"]
# Log all runtime properties on startup. Disable to avoid logging properties on startup:
druid.startup.logging.logProperties=true
#
# Service discovery
#
druid.selectors.indexing.serviceName=druid/overlord
druid.selectors.coordinator.serviceName=druid/coordinator
druid.sql.enable=true
deepStorage:
spec:
properties: |-
druid.storage.type=google
druid.google.bucket=bucket
druid.indexer.logs.directory=data/logs/
type: default
metadataStore:
spec:
properties: |-
druid.metadata.storage.type=postgresql
druid.metadata.storage.connector.connectURI=jdbc:postgresql://host
druid.metadata.postgres.ssl.useSSL=true
druid.metadata.postgres.ssl.sslMode="verify-ca"
druid.metadata.postgres.ssl.sslCert="/secrets/client-cert.pem"
druid.metadata.postgres.ssl.sslKey="/secrets/client-key.pem"
druid.metadata.postgres.ssl.sslRootCert="/secrets/server-ca.pem"
druid.metadata.storage.connector.user=user
druid.metadata.storage.connector.password=password
druid.metadata.storage.connector.createTables=true
druid.metadata.postgres.dbTableSchema=schema
type: default
zookeeper:
spec:
properties: |-
druid.zk.service.host=zk-zookeeper-0.zk-zookeeper-headless.default.svc.cluster.local,zk-zookeeper-1.zk-zookeeper-headless.default.svc.cluster.local,zk-zookeeper-2.zk-zookeeper-headless.default.svc.cluster.local
druid.zk.paths.base=/druid
type: default
nodes:
brokers:
nodeType: "broker"
druid.port: 8082
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/query/broker"
podDisruptionBudgetSpec:
maxUnavailable: 1
replicas: 1
runtime.properties: |
druid.service=druid/broker
druid.plaintextPort=8082
# HTTP server settings
druid.server.http.numThreads=25
# HTTP client settings
druid.broker.http.numConnections=5
# Processing threads and buffers
druid.processing.buffer.sizeBytes=1073741824
druid.processing.numThreads=1
druid.processing.tmpDir=var/druid/processing
druid.broker.retryPolicy.numTries=3
log4j.config: |-
<Configuration status="WARN">
<Appenders>
<Console name="logline" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
<Console name="msgonly" target="SYSTEM_OUT">
<PatternLayout pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="logline"/>
</Root>
<Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
<AppenderRef ref="msgonly"/>
</Logger>
</Loggers>
</Configuration>
extra.jvm.options: |-
-Xmx2G
-Xms2G
volumeClaimTemplates:
- metadata:
name: data-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: standard
volumeMounts:
- mountPath: /druid/data
name: data-volume
- mountPath: /secrets
name: secrets
readOnly: true
volumes:
- name: data-volume
emptyDir: {}
- name: secrets
projected:
sources:
- secret:
name: druid-gcloud-bucket-key
- secret:
name: cloud-sql
resources:
requests:
memory: "6G"
cpu: "1"
limits:
memory: "6G"
cpu: "1"
livenessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8082
readinessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8082
services:
- metadata:
name: broker-%s-service
spec:
clusterIP: None
ports:
- name: tcp-service-port
port: 8082
targetPort: 8082
type: ClusterIP
hpAutoscaler:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: druid-cluster-brokers
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 60
- type: Resource
resource:
name: memory
targetAverageUtilization: 60
coordinators:
nodeType: "coordinator"
druid.port: 8081
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/master/coordinator"
replicas: 1
podDisruptionBudgetSpec:
maxUnavailable: 1
runtime.properties: |
druid.service=druid/coordinator
druid.coordinator.startDelay=PT30S
druid.coordinator.period=PT30S
druid.coordinator.kill.on=true
druid.coordinator.kill.period=PT2H
druid.coordinator.kill.durationToRetain=PT0s
druid.coordinator.kill.maxSegments=5000
log4j.config: |-
<Configuration status="WARN">
<Appenders>
<Console name="logline" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
<Console name="msgonly" target="SYSTEM_OUT">
<PatternLayout pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="logline"/>
</Root>
<Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
<AppenderRef ref="msgonly"/>
</Logger>
</Loggers>
</Configuration>
services:
- metadata:
name: coordinator-%s-service
spec:
clusterIP: None
ports:
- name: tcp-service-port
port: 8081
targetPort: 8081
type: ClusterIP
extra.jvm.options: |-
-Xmx1G
-Xms1G
livenessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8081
readinessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8081
volumeClaimTemplates:
- metadata:
name: data-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: standard
volumeMounts:
- mountPath: /druid/data
name: data-volume
- mountPath: /secrets
name: secrets
readOnly: true
volumes:
- name: data-volume
emptyDir: {}
- name: secrets
projected:
sources:
- secret:
name: druid-gcloud-bucket-key
- secret:
name: cloud-sql
resources:
limits:
cpu: "1"
memory: 6G
requests:
cpu: "1"
memory: 6G
hpAutoscaler:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: druid-cluster-coordinators
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 60
- type: Resource
resource:
name: memory
targetAverageUtilization: 60
historicals:
nodeType: "historical"
druid.port: 8083
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/data/historical"
podDisruptionBudgetSpec:
maxUnavailable: 1
replicas: 1
livenessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8083
readinessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8083
runtime.properties: |
druid.service=druid/historical
druid.server.http.numThreads=10
druid.processing.buffer.sizeBytes=1073741824
druid.processing.numMergeBuffers=1
druid.processing.numThreads=2
# Segment storage
druid.segmentCache.locations=[{\"path\":\"/druid/data/segments\",\"maxSize\":1099511627776}]
druid.server.maxSize=1099511627776
log4j.config: |-
<Configuration status="WARN">
<Appenders>
<Console name="logline" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
<Console name="msgonly" target="SYSTEM_OUT">
<PatternLayout pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="logline"/>
</Root>
<Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
<AppenderRef ref="msgonly"/>
</Logger>
</Loggers>
</Configuration>
extra.jvm.options: |-
-Xmx1G
-Xms1G
services:
- spec:
clusterIP: None
ports:
- name: tcp-service-port
port: 8083
targetPort: 8083
type: ClusterIP
volumeClaimTemplates:
- metadata:
name: data-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
storageClassName: ssd
volumeMounts:
- mountPath: /druid/data
name: data-volume
- mountPath: /secrets
name: secrets
readOnly: true
volumes:
- name: data-volume
emptyDir: {}
- name: secrets
projected:
sources:
- secret:
name: druid-gcloud-bucket-key
- secret:
name: cloud-sql
resources:
limits:
cpu: "1"
memory: 8G
requests:
cpu: "1"
memory: 8G
middlemanagers:
druid.port: 8091
extra.jvm.options: |-
-Xmx4G
-Xms4G
nodeType: middleManager
nodeConfigMountPath: /opt/druid/conf/druid/cluster/data/middlemanager
podDisruptionBudgetSpec:
maxUnavailable: 1
ports:
- containerPort: 8100
name: peon-0-pt
- containerPort: 8101
name: peon-1-pt
- containerPort: 8102
name: peon-2-pt
- containerPort: 8103
name: peon-3-pt
- containerPort: 8104
name: peon-4-pt
replicas: 1
resources:
limits:
cpu: "2"
memory: 5Gi
requests:
cpu: "2"
memory: 5Gi
livenessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8091
readinessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8091
runtime.properties: |-
druid.service=druid/middleManager
druid.worker.capacity=4
druid.indexer.runner.javaOpts=-server -XX:MaxDirectMemorySize=10240g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/druid/data/tmp -Dlog4j.debug -XX:+UnlockDiagnosticVMOptions -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=50 -XX:GCLogFileSize=10m -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:+UseG1GC -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -XX:HeapDumpPath=/druid/data/logs/peon.%t.%p.hprof -Xms10G -Xmx10G
druid.indexer.task.baseTaskDir=/druid/data/baseTaskDir
druid.server.http.numThreads=10
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=1
druid.indexer.fork.property.druid.processing.numMergeBuffers=1
druid.indexer.fork.property.druid.processing.numThreads=1
# Processing threads and buffers on Peons
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000
druid.indexer.fork.property.druid.processing.numThreads=1
log4j.config: |-
<Configuration status="WARN">
<Appenders>
<Console name="logline" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
<Console name="msgonly" target="SYSTEM_OUT">
<PatternLayout pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="logline"/>
</Root>
<Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="info">
<AppenderRef ref="msgonly"/>
</Logger>
</Loggers>
</Configuration>
services:
- spec:
clusterIP: None
ports:
- name: tcp-service-port
port: 8091
targetPort: 8091
- name: peon-port-0
port: 8100
targetPort: 8100
- name: peon-port-1
port: 8101
targetPort: 8101
- name: peon-port-2
port: 8102
targetPort: 8102
- name: peon-port-3
port: 8103
targetPort: 8103
- name: peon-port-4
port: 8104
targetPort: 8104
type: ClusterIP
volumeClaimTemplates:
- metadata:
name: data-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: ssd
volumeMounts:
- mountPath: /secrets
name: secrets
readOnly: true
- mountPath: /druid/data
name: data-volume
volumes:
- name: secrets
projected:
sources:
- secret:
name: druid-gcloud-bucket-key
- secret:
name: cloud-sql
securityContext:
fsGroup: 0
runAsGroup: 0
runAsUser: 0
hpAutoscaler:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: druid-cluster-middlemanagers
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 60
- type: Resource
resource:
name: memory
targetAverageUtilization: 60
overlords:
druid.port: 8090
extra.jvm.options: |-
-Xmx4G
-Xms4G
nodeType: overlord
podDisruptionBudgetSpec:
maxUnavailable: 1
nodeConfigMountPath: /opt/druid/conf/druid/cluster/master/overlord
replicas: 1
resources:
limits:
cpu: "2"
memory: 6Gi
requests:
cpu: "2"
memory: 6Gi
runtime.properties: |-
druid.service=druid/overlord
druid.indexer.queue.startDelay=PT30S
druid.indexer.runner.type=remote
druid.indexer.storage.type=metadata
log4j.config: |-
<Configuration status="WARN">
<Appenders>
<Console name="logline" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
<Console name="msgonly" target="SYSTEM_OUT">
<PatternLayout pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="logline"/>
</Root>
<Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
<AppenderRef ref="msgonly"/>
</Logger>
</Loggers>
</Configuration>
livenessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8081
readinessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8081
services:
- metadata:
name: overlord-%s-service
spec:
clusterIP: None
ports:
- name: tcp-service-port
port: 8090
targetPort: 8090
type: ClusterIP
volumeClaimTemplates:
- metadata:
name: data-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: standard
volumeMounts:
- mountPath: /druid/data
name: data-volume
- mountPath: /secrets
name: secrets
readOnly: true
volumes:
- name: secrets
projected:
sources:
- secret:
name: druid-gcloud-bucket-key
- secret:
name: cloud-sql
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsUser: 1000
hpAutoscaler:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: druid-cluster-overlords
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 60
- type: Resource
resource:
name: memory
targetAverageUtilization: 60
routers:
livenessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8888
readinessProbe:
initialDelaySeconds: 30
httpGet:
path: /status/health
port: 8888
druid.port: 8888
extra.jvm.options: |-
-Xmx512m
-Xms512m
nodeType: router
podDisruptionBudgetSpec:
maxUnavailable: 1
nodeConfigMountPath: /opt/druid/conf/druid/cluster/query/router
replicas: 1
runtime.properties: |
druid.service=druid/router
druid.plaintextPort=8888
# HTTP proxy
druid.router.http.numConnections=50
druid.router.http.readTimeout=PT5M
druid.router.http.numMaxThreads=100
druid.server.http.numThreads=100
# Service discovery
druid.router.defaultBrokerServiceName=druid/broker
druid.router.coordinatorServiceName=druid/coordinator
# Management proxy to coordinator / overlord: required for unified web console.
druid.router.managementProxy.enabled=true
log4j.config: |-
<Configuration status="WARN">
<Appenders>
<Console name="logline" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
<Console name="msgonly" target="SYSTEM_OUT">
<PatternLayout pattern="%m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="logline"/>
</Root>
<Logger name="org.apache.druid.java.util.emitter.core.LoggingEmitter" additivity="false" level="debug">
<AppenderRef ref="msgonly"/>
</Logger>
</Loggers>
</Configuration>
services:
- metadata:
name: router-%s-service
spec:
clusterIP: None
ports:
- name: tcp-service-port
port: 8888
targetPort: 8888
type: ClusterIP
volumeClaimTemplates:
- metadata:
name: data-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: ssd
volumeMounts:
- mountPath: /druid/data
name: data-volume
- mountPath: /secrets
name: secrets
readOnly: true
volumes:
- name: secrets
projected:
sources:
- secret:
name: druid-gcloud-bucket-key
- secret:
name: cloud-sql
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsUser: 1000
@AdheipSingh
From what I can see in the logs, the following config (slightly modified from tiny-cluster) does work.
apiVersion: "druid.apache.org/v1alpha1"
kind: "Druid"
metadata:
name: tiny-cluster
spec:
image: apache/druid:0.17.0
startScript: /druid.sh
securityContext:
fsGroup: 1000
runAsUser: 1000
runAsGroup: 1000
services:
- spec:
type: ClusterIP
clusterIP: None
commonConfigMountPath: "/opt/druid/conf/druid/cluster/_common"
jvm.options: |-
-server
-XX:MaxDirectMemorySize=10240g
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Dlog4j.debug
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
log4j.config: |-
<?xml version="1.0" encoding="UTF-8" ?>
<Configuration status="WARN">
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="Console"/>
</Root>
</Loggers>
</Configuration>
common.runtime.properties: |
# Zookeeper
druid.zk.service.host=zk-zookeeper-0.zk-zookeeper-headless.default.svc.cluster.local,zk-zookeeper-1.zk-zookeeper-headless.default.svc.cluster.local,zk-zookeeper-2.zk-zookeeper-headless.default.svc.cluster.local
druid.zk.paths.base=/druid-tiny
druid.zk.service.compress=false
# Metadata Store
druid.metadata.storage.type=derby
druid.metadata.storage.type=derby
druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/var/druid/metadata.db;create=true
druid.metadata.storage.connector.host=localhost
druid.metadata.storage.connector.port=1527
druid.metadata.storage.connector.createTables=true
# Deep Storage
druid.storage.type=local
druid.storage.storageDirectory=/druid/data/deepstorage
#
# Extensions
#
druid.extensions.loadList=["druid-s3-extensions"]
#
# Service discovery
#
druid.selectors.indexing.serviceName=druid/overlord
druid.selectors.coordinator.serviceName=druid/coordinator
nodes:
brokers:
nodeType: "broker"
druid.port: 8088
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/query/broker"
replicas: 1
runtime.properties: |
druid.service=druid/broker
# HTTP server threads
druid.broker.http.numConnections=5
druid.server.http.numThreads=10
# Processing threads and buffers
druid.processing.buffer.sizeBytes=1
druid.processing.numMergeBuffers=1
druid.processing.numThreads=1
druid.sql.enable=false
extra.jvm.options: |-
-Xmx1G
-Xms1G
volumeMounts:
- mountPath: /druid/data
name: data-volume
volumes:
- name: data-volume
emptyDir: {}
resources:
requests:
memory: "2G"
cpu: "2"
limits:
memory: "2G"
cpu: "2"
coordinators:
nodeType: "coordinator"
druid.port: 8088
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/master/coordinator-overlord"
replicas: 1
runtime.properties: |
druid.service=druid/coordinator
# HTTP server threads
druid.coordinator.startDelay=PT30S
druid.coordinator.period=PT30S
# Configure this coordinator to also run as Overlord
druid.coordinator.asOverlord.enabled=true
druid.coordinator.asOverlord.overlordService=druid/overlord
druid.indexer.queue.startDelay=PT30S
druid.indexer.runner.type=local
extra.jvm.options: |-
-Xmx1G
-Xms1G
volumeMounts:
- mountPath: /druid/data
name: data-volume
volumes:
- name: data-volume
emptyDir: {}
resources:
requests:
memory: "2G"
cpu: "2"
limits:
memory: "2G"
cpu: "2"
historicals:
nodeType: "historical"
druid.port: 8088
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/data/historical"
replicas: 1
runtime.properties: |
druid.service=druid/historical
druid.server.http.numThreads=5
druid.processing.buffer.sizeBytes=1
druid.processing.numMergeBuffers=1
druid.processing.numThreads=1
# Segment storage
druid.segmentCache.locations=[{\"path\":\"/druid/data/segments\",\"maxSize\":10737418240}]
druid.server.maxSize=10737418240
extra.jvm.options: |-
-Xmx1G
-Xms1G
volumeMounts:
- mountPath: /druid/data
name: data-volume
volumes:
- name: data-volume
emptyDir: {}
resources:
requests:
memory: "2G"
cpu: "2"
limits:
memory: "2G"
cpu: "2"
I got it to work by setting nodeConfigMountPath: /opt/druid/conf/druid/cluster/master/coordinator-overlord
on both the coordinator and the overlords.
I think this works because for whatever reason druid is always choosing to use /opt/druid/conf/druid/cluster/master/coordinator-overlord/runtime.properties
, even in the presence of /opt/druid/conf/druid/cluster/master/coordinator/runtime.properties
.
As far as I can tell, there doesn't seem to be a problem with having the configurations in that folder. The unified web console correctly shows the overlord and coordinator on separate pod hosts, so all good?
@EmiPhil thanks for documenting the solution. yes start scripts in Druid's docker image does have the behavior you described looking for "coordinator-overlord" on both coordinator as well as overlord pods.
I'm having a hard time getting the operator to launch a working cluster. From what I can tell from the logs, the coordinator and overlord do not seem to be getting initiated with the config in the yaml deployment (attached as help.txt). The other parts of the cluster seem to be able to find each other.
help.txt
Logs for coordinator:
Logs for overlord:
Logs for Broker:
logs-from-druid-cluster-brokers-in-druid-cluster-brokers-0.txt
At first I was using the prebuilt image from docker to run the operator and ran into the same errors. I am now using a locally built docker image from cloning the repo and building it from there.
The error shows up if I use
image: "apache/incubator-druid:0.16.0-incubating"
in the configs as well.I'm running out of ideas for what could be going wrong. Help would be greatly appreciated!