druid-io / docker-druid

Druid Docker
197 stars 159 forks source link

[Proposal] Deploying truly distributed Druid clusters with docker #21

Open xiaoyao1991 opened 7 years ago

xiaoyao1991 commented 7 years ago

Hey guys,

I see that the image created from this repo is running every node together in one single container. While it's helpful for users to test and try out Druid, it isn't particularly useful when it comes to deploying a production cluster.

I've been working on using docker to deploy a truly distributed Druid cluster lately. I have had something working and I'd like to share it and contribute back. I'm wondering if it will be valuable though. Take a look at this fork to see what I've done so far. While the settings in that fork are set specifically for my team's research purpose, they can be generalized and made extendable. I'm still in an exploration stage on docker deployments, so I may have made stupid mistakes.

Due to its lightweight, Docker philosophy encourages running only one service in a container and having containers talk to each other, rather than running all dependencies and services in one container. Therefore, I defined separate images for each type of Druid node(druid-broker, druid-coordinator, etc.) as well as dependencies(druid-zookeeper, druid-mysql, druid-kafka, etc.) I also packed the jvm and runtime configurations into a separate image(druid-conf).

I had a discussion earlier with @cheddar on deployment stuff. He made it clear that deployment in general should only have 3 steps:

  1. Download the artifact.
  2. Download the configurations.
  3. Run the artifact with the configurations.

I followed this guideline: when running a specific druid node, say broker, all you need to do is:

  1. Pull druid-broker image
  2. Pull druid-conf image
  3. Run druid-conf in a container, and then link it as a volume provider(using --volumes-from) for the druid-broker container.

Containers on different nodes can freely communicate with each others as long as they are within a same overlay network. I leverages docker-machine to manage/provision remote nodes, and docker swarm for container orchestration. Running a broker node for example is just as simple as:
docker run -itd --volumes-from=node-1-conf --name=broker --network=p-my-net --env="constraint:node==p-node-5" druid-broker:latest

guobingkun commented 7 years ago

👍 I am totally on board with this.

saidimu commented 7 years ago

@xiaoyao1991 Any updates on this? Looks awesome!

xiaoyao1991 commented 7 years ago

@saidimu I have one more thing to confirm before I organize something up. In our experiment settings, we were using a simple NFS as deep storage instead of HDFS. I'm confirming if the nodes in the swarm overlay network can properly talk to HDFS.

martin-liu commented 7 years ago

@xiaoyao1991 it's great, any update?

xiaoyao1991 commented 7 years ago

@martin-liu Thanks. I've opened a preliminary PR(#23). I haven't yet had the time to address the comments there that relates to documentation.

sjtoik commented 6 years ago

I have a setup made for docker-compose and for kubernetes if you would be interested to maintain those.

stingerpk commented 6 years ago

@sjtoik Is it possible to share a link to your setup?

rathko commented 6 years ago

@sjtoik +1 for Kubernetes setup

mdh69 commented 6 years ago

@sjtoik +1 for Kubernetes setup

sjtoik commented 6 years ago

@stingerpk I'm still testing different aspects of the deployment and features. If it is not too much of an hassle to upkeep our environment specific one and public counterpart, I'll publish it.