apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.41k stars 3.68k forks source link

Druid Router unable to Discover Broker Services #9619

Closed acherla closed 4 years ago

acherla commented 4 years ago

Description: When hosting the druid router service within a docker swarm cluster and connecting it to my 3 node zookeeper quorum the router service is unable to discover any brokers on the druid/broker service path in zookeeper. Ive validated that the coordinator service is able to discover all services, however the router service is unable to identify the broker services or any service for that matter.

Ive triple checked that there is no firewall between the hosts, and I am able to telnet to both zookeeper, the brokers, and the coordinator from the router docker container within the swarm cluster on the appropriate ports.

Ive also validated that the coordinator correctly identifies the brokers and all other services, but the router service fails to identify the brokers. Its even stranger that the coordinator properly identifies the hosts but the router is unable to. See below coordinator/router endpoints for the cluster configuration they are posting.

/druid/coordinator/v1/cluster {"coordinator":[{"host":"10.0.7.30","service":"druid/coordinator","plaintextPort":8081}],"overlord":[{"host":"10.0.7.30","service":"druid/coordinator","plaintextPort":8081}],"broker":[{"host":"10.0.7.32","service":"druid/broker","plaintextPort":8082}],"historical":[{"host":"10.0.7.41","service":"druid/historical","plaintextPort":8083},{"host":"10.0.7.42","service":"druid/historical","plaintextPort":8083}]}

/druid/router/v1/brokers {"druid/broker":[]} common.runtime.properties

druid.extensions.loadList=["postgresql-metadata-storage","druid-s3-extensions", "druid-kafka-indexing-service", "druid-datasketches"]
druid.host=<THIS IS REPLACED WITH THE IP OF THE DOCKER CONTAINER AT STARTUP>
druid.startup.logging.logProperties=true
druid.zk.service.host=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
druid.zk.paths.base=/druid
druid.metadata.storage.type=postgresql
druid.metadata.storage.connector.connectURI=jdbc:postgresql://postgres-db:5432/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=druid
druid.storage.type=s3
druid.storage.bucket=druid-storage
druid.storage.baseKey=druid/segments
druid.s3.accessKey=SAMPLE_ACCESS_KEY
druid.s3.secretKey=SAMPLE_ACCESS_SECRET_KEY
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=druid-logs
druid.indexer.logs.s3Prefix=druid/indexing-logs
druid.selectors.indexing.serviceName=druid/overlord
druid.selectors.coordinator.serviceName=druid/coordinator
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
druid.emitter=noop
druid.emitter.logging.logLevel=info
druid.indexing.doubleStorage=double
druid.server.hiddenProperties=["druid.s3.accessKey","druid.s3.secretKey","druid.metadata.storage.connector.password"]
druid.sql.enable=true
druid.lookup.enableLookupSyncOnStartup=false

router runtime.properties

druid.service=druid/router
druid.plaintextPort=8888
druid.router.http.numConnections=50
druid.router.http.readTimeout=PT5M
druid.router.http.numMaxThreads=100
druid.server.http.numThreads=100
druid.router.defaultBrokerServiceName=druid/broker
druid.router.coordinatorServiceName=druid/coordinator
druid.router.managementProxy.enabled=true

Affected Version

Version: 0.16.1

Description

Please include as much detailed information about the problem as possible.

acherla commented 4 years ago

The root cause for this issue was a problem with Zookeeper syncing across the quorum. Some of the zk nodes appeared to not sync up with the leader for whatever reason in a docker swarm. I configured druid to connect to the first ZK node and it managed to properly pull up all other service paths.

To fix the problem I federated out the zk Quorum to its own docker stack and deployed it separately from the druid cluster.