elastic / elasticsearch-docker

Official Elasticsearch Docker image
Apache License 2.0
791 stars 241 forks source link

Provide real world examples #31

Closed bluepuma77 closed 7 years ago

bluepuma77 commented 7 years ago

Feature Description

Please provide real world examples in the style of the docker-compose.yml at Install Elasticsearch with Docker.

A cluster of 2 nodes on the same host are a starting point, I would love to see how to "officially" setup and run a "production" cluster on 3 bare-metal servers.

jarpy commented 7 years ago

Hi @bluepuma77,

I see specific techniques for running containers across multiple nodes as a question of orchestration. We (Elastic, and the Docker team specifically) don't make recommendations or assumptions about the orchestration system in use at various sites. Our goal is to provide high-quality images that will work well with Docker in any orchestration context.

A real-world, production-ready configuration is, by its nature, specific to the orchestrator, the storage backend, the network topology and many other site-specific properties. Even if we only consider the orchestrator, we are forced to ask: "should the example focus on Kubernetes, Mesos, ECS, Swarm or something else?".

I would think the best path for someone designing a deployment would be to combine our documentation with a quality book on their orchestrator of choice. Docker is the glue layer between our images and the orchestrator, so our contract is to interact well with Docker. The orchestrator has the same contract, but from the other side, forming a stack. If we "overreach" in the stack layers, we have a terrible combinatorial explosion problem!

bluepuma77 commented 7 years ago

Hi @jarpy,

thanks for the response, I do understand your point of view. If you don't want to give a recommendation regarding orchestration that is fine. But I am coming from a different angle.

I want to run Elastic. Clustered. In Docker. 3 machines. Hosted bare metal.

It took me several days to get it up and running with v2.2. With a scripted Docker command for every machine. But it seems with every new version some parameters change and it is suddenly not working anymore.

Furthermore securing the cluster was hard. For port 9200 you could use a http proxy, but for port 9300 you needed a different solution. VPN is hard to configure, Docker swarm mode doesn't allow "unmanaged" containers to use an encrypted swarm overlay network. So create 3 Docker services and "pin" each to a single machine? With X-Pack (security) it should be easier - for 30 days.

Anyway, how about providing simple docker commands to run a cluster on 3 machines? (Assuming that 3 servers are the best solution for redundancy in a production environment)

### COMMON ###
export ip1=1.2.3.1
export ip2=1.2.3.2
export ip3=1.2.3.3

### SERVER1 ###
docker run \
  --name elastic1 \
  -v /elastic/data:/usr/share/elasticsearch/data \
  -v /elastic/snapshot:/snapshot \
  -d elasticsearch:5.2.2 \
  -E node.name=elastic1 \
  -E network.publish_host=$ip1 \
  -E discovery.zen.ping.unicast.hosts=$ip1,$ip2,$ip3
  ...

### SERVER2 ###
...

### SERVER3 ###
...
jarpy commented 7 years ago

Clustered. In Docker. 3 machines. Hosted bare metal.

Sure! Docker doesn't really add much complexity then, since you're not dealing with overlay networks and the like. Your script looks pretty solid, and I think it's worth noting that there is very little "Docker stuff" in it. With the exception of the volume attachment flags, the important parts of your script are arguments to the Elasticsearch process. They are the same whether you are running inside Docker or not. Elasticsearch's network settings are well documented already, we shouldn't duplicate them in the Docker section.

What we try to document in the Docker section, is those extra capabilities and complexities that are layered on top the basic Elasticsearch experience. For example, the ability to define settings (such as the network settings) via environment variables.

That might be an old version of your script, so forgive me if I'm on the wrong track, but you also appear to be running the Elasticsearch image that was made by Docker Inc, not this one. The two images have no relation to each other (other than containing Elasticsearch!) so if our documentation seems misaligned with your experience, that might be a big factor.

bluepuma77 commented 7 years ago

I think my example is far from complete and not working. I would like to see an "official" real world example on the official Install Elasticsearch with Docker page. How to do it right, set up a redundant Elastic cluster, with best practice commands/files/scripts.

Running Elasticsearch from the command line in Production mode are good headlines, but from my point of view an Elastic cluster on the same machine is useless (power breaks, harddisk breaks, CPU/RAM breaks), the current single docker-compose.yml is for cluster demo purpose only.

A production environment should be at least two machines, so one can fail. With all the implications for the example commands/files/scripts e.g. to connect to another machine, etc.

bluepuma77 commented 7 years ago

@dliappis Do you have an opinion on this?

jarpy commented 7 years ago

Designing a production-grade deployment of Elasticsearch is a big topic, and involves many variables that are unique to the application and the infrastructure. Sadly, there simply isn't a "one right way" to design and deploy a cluster. Any concrete design that we might suggest would be, necessarily, an incorrect design for many people. We really, really don't want to give people incorrect information.

Additionally, few of the design decisions around an Elasticsearch deployment are really unique to Docker. So even if we could say what the "right" way is, most of it wouldn't belong in the Docker specific documentation.

We may have been unclear about the meaning of "production mode", sorry if that's so. We don't mean "completely production ready". We simply mean the pre-existing definitions of "production mode" and "development mode" as defined elsewhere in the Elasticsearch documentation. We are, naturally, not suggesting that co-locating multiple nodes of a production cluster on the same hardware is a good idea. You are absolutely correct that it's a very bad idea indeed!

bluepuma77 commented 7 years ago

@jarpy for sure it's a long way to get to an optimized production grade Elastic cluster.

What I would just like to see is how to actually run basic elasticsearch-docker on multiple machines, not just on a single machine.

morphers82 commented 7 years ago

I would also like to see a real-world example installation example

dliappis commented 7 years ago

There is a discuss comment with a Vagrantfile that can bring two or more vms to serve as an example of running a clustered elasticsearch with Docker. YMMV and this doesn't take advantage of any advanced orchestration features. There's also a discussion in https://github.com/elastic/elasticsearch-docker/issues/91 on configuring elasticsearch with Docker Swarm.