elastic / elasticsearch-docker

Official Elasticsearch Docker image
Apache License 2.0
791 stars 241 forks source link

Swarm Support Feature Request #91

Closed hatdropper1977 closed 5 years ago

hatdropper1977 commented 7 years ago

Feature Description

Please provide native support for Docker Swarm stacks.

I found an unofficial patch here: https://github.com/a-goryachev/docker-swarm-elasticsearch but would prefer an official solution.

hatdropper1977 commented 7 years ago

Some details:

The fact that I cannot name containers in swarm (this may be true of other orchestration frameworks) causes the following issues

fcrisciani commented 7 years ago

+1 to this, I was expecting a configuration like this to work:

transport.host: 0.0.0.0
cluster.name: docker-test-cluster

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts:
  - tasks.dev_elasticsearch

tasks. resolves the list of IP addresses for the service on the overlay network, so should be enough for the cluster to start and scale up and down.

so far I see these logs instead:

[2017-08-01T04:56:18,540][INFO ][o.e.c.s.ClusterService   ] [BsId4cc] removed {{BsId4cc}{BsId4cctQZOu9LWxcNzRXA}{Mk4RwWtaQCK7BYs_702PtQ}{10.0.0.2}{10.0.0.2:9300},}, added {{kAnIKWD}{kAnIKWD1QwSZV1iHDmK01w}{_nagtREAQPm5zM84DpJcog}{10.0.0.2}{10.0.0.2:9300},}, reason: zen-disco-elected-as-master ([1] nodes joined)[{kAnIKWD}{kAnIKWD1QwSZV1iHDmK01w}{_nagtREAQPm5zM84DpJcog}{10.0.0.2}{10.0.0.2:9300}]
[2017-08-01T04:56:18,541][WARN ][o.e.c.s.ClusterService   ] [BsId4cc] failing [zen-disco-elected-as-master ([1] nodes joined)[{kAnIKWD}{kAnIKWD1QwSZV1iHDmK01w}{_nagtREAQPm5zM84DpJcog}{10.0.0.2}{10.0.0.2:9300}]]: failed to commit cluster state version [1]
org.elasticsearch.discovery.Discovery$FailedToCommitClusterStateException: unexpected error while preparing to publish
        at org.elasticsearch.discovery.zen.PublishClusterStateAction.publish(PublishClusterStateAction.java:163) ~[elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.discovery.zen.ZenDiscovery.publish(ZenDiscovery.java:311) ~[elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:741) [elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) [elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) [elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [elasticsearch-5.5.1.jar:5.5.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.elasticsearch.discovery.Discovery$FailedToCommitClusterStateException: not enough masters to ack sent cluster state.[1] needed , have [0]
        at org.elasticsearch.discovery.zen.PublishClusterStateAction$SendingController.<init>(PublishClusterStateAction.java:555) ~[elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.discovery.zen.PublishClusterStateAction$SendingController.<init>(PublishClusterStateAction.java:527) ~[elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.discovery.zen.PublishClusterStateAction.publish(PublishClusterStateAction.java:160) ~[elasticsearch-5.5.1.jar:5.5.1]
        ... 12 more

To reproduce: create the config file with the parameters at the top.

create compose file: service-compose.yml

version: "3.3"

services:
  elasticsearch:
    image: elasticsearch:alpine
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    volumes:
      - ./elasticsearch/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    networks:
      - backend
    deploy:
      replicas: 3

networks:
  backend:

deploy: docker stack deploy -c service-compose.yml dev

muresan commented 7 years ago

It is a problem with how ES detects local IPs, if you have a docker swarm VIP setup it will basically detect the VIP IP and use that, which generates a conflict:

Caused by: java.lang.IllegalArgumentException: can't add node {H12KwKI}{H12KwKI7TUGB9qFBw7Uk_w}{ULNNEIrbS06Z5Z-CTDPWVQ}{10.0.0.2}{10.0.0.2:9300}{ml.enabled=true}, found existing node {DpGAJ1-}{DpGAJ1-qTdGqD27LRJJumA}{tLia8vwBRlWXMgVa-z5phw}{10.0.0.2}{10.0.0.2:9300}{ml.enabled=true} with same address

same node ID with 2 different IPs. Detection gives:

eth0
        inet 10.0.0.2 netmask:255.255.255.0 broadcast:0.0.0.0 scope:site
        inet 10.0.0.4 netmask:255.255.255.0 broadcast:0.0.0.0 scope:site
        hardware 02:42:0A:00:00:04
        UP MULTICAST mtu:1450 index:25

Because docker swarm adds the VIP locally for routing purposes. Below is a setup which works in bringing up a cluster but the downside is that you will have to point a LB yourself at 9200 on the manager nodes to get to ES (from outside).

version: '3.3'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.5.1
    environment:
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=false
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "discovery.zen.ping.unicast.hosts=tasks.elasticsearch"
    deploy:
      endpoint_mode: dnsrr
      replicas: 3
      resources:
        limits:
          memory: 1G
        reservations:
          memory: 512M
    volumes:
      - esdata:/usr/share/elasticsearch/data
    networks:
      - esnet
  kibana:
    image: docker.elastic.co/kibana/kibana:5.5.1
    environment:
      ELASTICSEARCH_URL: http://elasticsearch:9200
    networks:
      - esnet
    ports:
      - 5601:5601

volumes:
  esdata:
    driver: local

networks:
  esnet:

Hope this helps. I could not (in a very quick search) find a way to override ES's network configuration detection. Edit: endpoint_mode needs compose file version 3.3 which means docker 17.06.0+

hatdropper1977 commented 7 years ago

In summary, it would be a useful Elasticsearch feature if you could just auto-detect other Elasticsearch node containers on an overlay network.

@muresan - Real fast, can you explain what you mean by:

you will have to point a LB yourself at 9200 on the manager nodes to get to ES (from outside).

fcrisciani commented 7 years ago

@hatdropper1977 it means that you have to frontend the 3 ES replicas with a load balancer that will round robin requests to the 3 instances. Basically because there is no VIP, you will have 3 ES instances with 3 IPs. Now if you use one of them, all the requests will be directed to that IP that you choose, if you want to spread the load among the 3, you can for example put in front an haproxy or nginx that will redirect all the request to the 3 backend ES. This is an example: https://sematext.com/blog/2016/12/12/docker-elasticsearch-swarm/

hatdropper1977 commented 7 years ago

@fcrisciani In the Sematext example... Will this work with X-Pack/ TLS? Can the ES nodes communicate w/ each other using the 162.243.255.10.xip.io address? Or do they use the new, local, overlay addresses? Or the ephemeral container names? In other nodes, what do we put in the TLS cert's SAN? External nodes will reach the nodes via the 162 address so that's obvious. But how do the intra-overlay ES nodes communicate with each other?

fcrisciani commented 7 years ago

@hatdropper1977 disclaimer is that I did not try myself that example. It was more to show you the idea of having a frontend container. I'm also not familiar with the X-Pack etc, but I expect it to work if nginx is properly configured, at the end will only act as a proxy. For what concerns the communication between es instances, that will happen on the docker overlay backend (you can configure it as encrypted)

hatdropper1977 commented 7 years ago

@fcrisciani Thanks!

It appears that the Elasticsearch ecosystem (X-Pack security, etc.) doesn't play nicely with swarm and if I want to use swarm I'll need to swim against the current a bit. The hack I mention in the original comment would work as would the sematext approach. I would prefer native support but I see the reasons why that doesn't work.

muresan commented 7 years ago

@hatdropper1977 there is a way already, they are listed in DNS under tasks.<servicename>,tasks.<stack_servicename>, <servicename> or <stack_servicename>. Depending on the endpoint_mode you end up with different results. The problem is that with VIP endpoint ES will investigate the container and say "hey, there are 2 IPs on eth0, let's use the 1st" and that's the VIP endpoint, it will advertise it to other members, which use it to talk to ... themselves over the swarm balancer. I found no way to tell ES to use the 2nd IP, no way to get info from the interface to identify the VIP, you can heuristically say that the 1st available IP address is used for the VIP, (but that is not guaranteed 100%) so you can say "let's use the greater IP address number".

useful Elasticsearch feature if you could just auto-detect other Elasticsearch node containers on an overlay network

fcrisciani commented 7 years ago

@hatdropper1977 @muresan with this change in docker: https://github.com/docker/libnetwork/pull/1877 I was able to spawn a cluster and scale it up and down.

docker compose:

version: "3.3"

services:
  elasticsearch:
    image: elasticsearch:alpine
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    volumes:
      - ./elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    networks:
      - backend
    deploy:
      replicas: 3
  kibana:
    image: kibana
    ports:
      - "5601:5601"
    networks:
      - backend

networks:
  backend:
    attachable: true

elasticsearch.yml:

network.host: _eth0:ipv4_
cluster.name: docker-test-cluster

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts:
  - tasks.dev_elasticsearch

The change by itself is simply moving the VIP on the loopback interface, but still the important thing to do is to use: network.host: _eth0:ipv4_. I have the feeling that elastic uses the lowest IP as ID so not specifying that one and using network.host:0.0.0.0 in my case they were still all clashing on the same IP

muresan commented 7 years ago

@fcrisciani that is great!. You can still have problems if you use multiple networks because there's no guarantee on eth0 mapping to the one you want. I've tested with 2 networks and I had eth2 and eth0 and eth2 was the one I wanted. If docker could rename the container interfaces to match the network names that would solve this but the VIP move to loopback is a big step forward. Edit: 2 networks use case is if you use network.bind_host for the API front (9200) and network.publish_host for the cluster gossip/traffic (9300).

fxdgear commented 7 years ago

I've been playing around with getting ES to work on swarm and have come up with a little POC.

There are a few things here of note.

  1. the coordinating node exposes port 9200. Since I've been testing this on a single node swarm only one node can expose port 9200 to the host. So i decided to make that a coordinating node so that it's small and acts kinda like a reverse proxy.

  2. using dnsrr by using dnsrr swarm will route requests directly to the container IP as opposed to the virtual IP. I was noticing that by using the virtual IP's es nodes were struggling to discover themselves because of the multiple IP's per container.

  3. the coordinating node is set as global so that each host in the swarm cluster will expose 9200 only 1 time.

  4. This has not solved anything regarding storage this is just a POC regarding the networking and independent scaling of ES nodes as services in swarm

  5. I have only tested this locally on my laptop running docker 17.06-ce

how to use (obviously):

  1. docker swarm init
  2. docker stack deploy -c es.yml es
  3. wait for the cluster to come up, and navigate to http://localhost:9200/_cat/nodes?v
  4. docker service scale es_data=3
  5. wait again for the 2 new nodes to be created and join the cluster.
version: "3.3"
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.5.1
    environment:
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "discovery.zen.minimum_master_nodes=2"
      - "discovery.zen.ping.unicast.hosts=master"
      - "node.master=false"
      - "node.data=false"
      - "node.ingest=false"
    networks:
       - esnet
    ports:
      - target: 9200
        published: 9200
        protocol: tcp
        mode: host
    deploy:
      endpoint_mode: dnsrr
      mode: 'global'
      resources:
        limits:
          memory: 1G
        ulimits:
          memlock:
            soft: -1
            hard: -1
  master:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.5.1
    environment:
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "discovery.zen.minimum_master_nodes=2"
      - "discovery.zen.ping.unicast.hosts=master"
      - "node.master=true"
      - "node.data=false"
      - "node.ingest=false"
    networks:
       - esnet
    deploy:
      endpoint_mode: dnsrr
      mode: 'replicated'
      replicas: 3
      resources:
        limits:
          memory: 1G
        ulimits:
          memlock:
            soft: -1
            hard: -1
  data:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.5.1
    environment:
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "discovery.zen.minimum_master_nodes=2"
      - "discovery.zen.ping.unicast.hosts=master"
      - "node.master=false"
      - "node.data=true"
      - "node.ingest=false"
    networks:
       - esnet
    deploy:
      endpoint_mode: dnsrr
      mode: 'replicated'
      replicas: 1
      resources:
        limits:
          memory: 1G
        ulimits:
          memlock:
            soft: -1
            hard: -1

networks:
  esnet:
    driver: overlay
hatdropper1977 commented 7 years ago

@fcrisciani That's great!!!

Any idea if X-Pack (specifically TLS) can play nicely with this fix or is this just for vanilla HTTP?

realcbb commented 7 years ago

I'm using 3 services running on 3 nodes (1 manager and 2 worker) to avoid the VIP question, but facing another problem.

docker-stack-es.yml

version: '3.2'
services:
  elasticsearch1:
    image: cbb/elasticsearch:5.5.0
    environment:
      ES_JAVA_OPTS: '-Xms256m -Xmx256m'
      cluster.name: es-cluster
      node.name: es1
      network.bind_host: 0.0.0.0
      discovery.zen.minimum_master_nodes: 2
      discovery.zen.ping.unicast.hosts: tasks.elasticsearch2,tasks.elasticsearch3
      xpack.security.enabled: 'false'
      xpack.monitoring.enabled: 'false'
      xpack.watcher.enabled: 'false'
      xpack.ml.enabled: 'false'
      http.cors.enabled: 'true'
      http.cors.allow-origin: '*'
      logger.level: debug
    volumes:
      - $VPATH/data/elasticsearch:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.labels.no == 1
      resources:
        limits:
          memory: 1g
        ulimits:
          memlock:
            soft: -1
            hard: -1
          nofile:
            soft: 65536
            hard: 65536
          nproc:
            soft: 65536
            hard: 65536
  elasticsearch2:
    image: cbb/elasticsearch:5.5.0
    environment:
      ES_JAVA_OPTS: '-Xms256m -Xmx256m'
      cluster.name: es-cluster
      node.name: es2
      network.bind_host: 0.0.0.0
      discovery.zen.minimum_master_nodes: 2
      discovery.zen.ping.unicast.hosts: tasks.elasticsearch1,tasks.elasticsearch3
      xpack.security.enabled: 'false'
      xpack.monitoring.enabled: 'false'
      xpack.watcher.enabled: 'false'
      xpack.ml.enabled: 'false'
      http.cors.enabled: 'true'
      http.cors.allow-origin: '*'
      logger.level: debug
    volumes:
      - $VPATH/data/elasticsearch:/usr/share/elasticsearch/data
    ports:
      - 9201:9200
      - 9301:9300
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.labels.no == 2
      resources:
        limits:
          memory: 1g
        ulimits:
          memlock:
            soft: -1
            hard: -1
          nofile:
            soft: 65536
            hard: 65536
          nproc:
            soft: 65536
            hard: 65536
  elasticsearch3:
    image: cbb/elasticsearch:5.5.0
    environment:
      ES_JAVA_OPTS: '-Xms256m -Xmx256m'
      cluster.name: es-cluster
      node.name: es3
      network.bind_host: 0.0.0.0
      discovery.zen.minimum_master_nodes: 2
      discovery.zen.ping.unicast.hosts: tasks.elasticsearch1,tasks.elasticsearch2
      xpack.security.enabled: 'false'
      xpack.monitoring.enabled: 'false'
      xpack.watcher.enabled: 'false'
      xpack.ml.enabled: 'false'
      http.cors.enabled: 'true'
      http.cors.allow-origin: '*'
      logger.level: debug
    volumes:
      - $VPATH/data/elasticsearch:/usr/share/elasticsearch/data
    ports:
      - 9202:9200
      - 9302:9300
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.labels.no == 3
      resources:
        limits:
          memory: 1g
        ulimits:
          memlock:
            soft: -1
            hard: -1
          nofile:
            soft: 65536
            hard: 65536
          nproc:
            soft: 65536
            hard: 65536
networks:
  default:
    external:
      name: myoverlay

cbb/elasticsearch:5.5.0 is just a docker tag of official docker.elastic.co/elasticsearch/elasticsearch:5.5.0.

docker stack deploy -c ./docker-stack-es.yml es. Everything goes well, and /_cat/nodes?v is ok too.

Notice that the cluster nodes ips are still VIP even throw I used tasks.ServiceName for zen discovery. The actual task ips are 10.10.0.3, 10.10.0.5, 10.10.0.7.

ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.10.0.6           27          93   5    0.15    0.06     0.01 mdi       *      es3
10.10.0.2           38          93   2    0.02    0.01     0.00 mdi       -      es1
10.10.0.4           33          94   2    0.04    0.04     0.06 mdi       -      es2

But after several minutes, docker exec -it container curl http://localhost:9200/_cat/nodes?v,

sometimes ok but sometimes not.

node1:

docker@node1:/Users/cbb/Dropbox/docker/sh$ docker exec -it es_elasticsearch1.1.obxncvu85mchi8tv14hhwv7aw curl http://localhost:9200/_cat/nodes?v
ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.10.0.4           29          92   0    0.00    0.03     0.14 mdi       -      es2
10.10.0.2           30          69   3    0.00    0.01     0.13 mdi       -      es1
10.10.0.6           28          91   2    0.00    0.02     0.12 mdi       *      es3
docker@node1:/Users/cbb/Dropbox/docker/sh$ docker exec -it es_elasticsearch1.1.obxncvu85mchi8tv14hhwv7aw curl http://localhost:9200/_cat/nodes?v
ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.10.0.4                                                       mdi       -      es2
10.10.0.2           41          92  22    0.23    0.49     0.30 mdi       -      es1
10.10.0.6           47          92  25    0.60    0.61     0.35 mdi       *      es3
docker@node1:/Users/cbb/Dropbox/docker/sh$ docker exec -it es_elasticsearch1.1.obxncvu85mchi8tv14hhwv7aw curl http://l
ocalhost:9200/_cat/nodes?v
{"error":{"root_cause":[{"type":"node_disconnected_exception","reason":"[es3][10.10.0.6:9300][cluster:monitor/state] disconnected"}],"type":"master_not_discovered_exception","reason":"NodeDisconnectedException[[es3][10.10.0.6:9300][cluster:monitor/state] disconnected]","caused_by":{"type":"node_disconnected_exception","reason":"[es3][10.10.0.6:9300][cluster:modocker@node1:/Users/cbb/Dropbox/docker/sh$ docker exec -it es_elasticsearch1.1.obxncvu85mchi8tv14hhwv7aw curl http://localhost:9200/_cat/nodes?v
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}
docker@node1:/Users/cbb/Dropbox/docker/sh$ docker exec -it es_elasticsearch1.1.obxncvu85mchi8tv14hhwv7aw curl http://localhost:9200/_cat/nodes?v
ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.10.0.4           45          92   1    0.00    0.01     0.00 mdi       -      es2
10.10.0.2           48          94   1    0.00    0.01     0.00 mdi       -      es1
10.10.0.6           40          93   1    0.03    0.03     0.00 mdi       *      es3

10.10.0.4 mdi - es2 10.10.0.2 41 92 22 0.23 0.49 0.30 mdi - es1 10.10.0.6 47 92 25 0.60 0.61 0.35 mdi * es3

there are some blanks after 10.10.0.4

node2:

docker@node2:~$ docker exec -it es_elasticsearch2.1.uqeo3p66yt5pcm7ytsgfn5rqa curl http://localhost:9200/_cat/nodes?v
ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.10.0.6           28          91   0    0.00    0.02     0.11 mdi       *      es3
10.10.0.4           29          93   0    0.00    0.02     0.13 mdi       -      es2
10.10.0.2           31          69   1    0.00    0.01     0.12 mdi       -      es1
docker@node2:~$ docker exec -it es_elasticsearch2.1.uqeo3p66yt5pcm7ytsgfn5rqa curl http://localhost:9200/_cat/nodes?v
ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.10.0.6           50          92   2    0.08    0.40     0.31 mdi       *      es3
10.10.0.4           42          93  18    0.12    0.36     0.28 mdi       -      es2
10.10.0.2                                                       mdi       -      es1
docker@node2:~$ docker exec -it es_elasticsearch2.1.uqeo3p6
6yt5pcm7ytsgfn5rqa curl http://localhost:9200/_cat/nodes?v
ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.10.0.6                                                       mdi       *      es3
10.10.0.4           39          93   0    0.08    0.04     0.03 mdi       -      es2
10.10.0.2                                                       mdi       -      es1

there are some blanks after 10.10.0.6 and 10.10.0.2.

node3:

docker@node3:~$ docker exec -it es_elasticsearch3.1.axcjyne2xda88t5y9owxgs2oz curl http://localhost:9200/_cat/nodes?v
ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.10.0.6           36          91  13    0.01    0.02     0.08 mdi       *      es3
10.10.0.2           35          69   1    0.01    0.01     0.08 mdi       -      es1
10.10.0.4           33          93   0    0.05    0.02     0.09 mdi       -      es2
docker@node3:~$ docker exec -it es_elasticsearch3.1.axcjyne2xda88t5y9owxgs2oz curl http://localhost:9200/_cat/nodes?v
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}docker@node3:~$
docker@node3:~$ docker exec -it es_elasticsearch3.1.axcjyne2xda88t5y9owxgs2oz curl http://localhost:9200/_cat/nodes?v
ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.10.0.6           47          93   1    0.00    0.00     0.00 mdi       *      es3
10.10.0.2           34          93   1    0.08    0.04     0.02 mdi       -      es1
10.10.0.4           29          93   1    0.11    0.08     0.07 mdi       -      es2

Another logstash service using the es output plugin on the same overlay network have some repeated logs like this:

logstash_logstash.0.96vmfjsm9w9g@node1    | [2017-08-04T08:44:06,387][WARN ][logstash.outputs.elasticsearch] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>#<Java::JavaNet::URI:0xe845a46>, :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::BadResponseCodeError, :error=>"Got response code '503' contacting Elasticsearch at URL 'http://10.10.0.4:9200/'"}
logstash_logstash.0.96vmfjsm9w9g@node1    | [2017-08-04T08:44:07,357][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://10.10.0.4:9200/, :path=>"/"}
logstash_logstash.0.96vmfjsm9w9g@node1    | [2017-08-04T08:44:07,362][WARN ][logstash.outputs.elasticsearch] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>#<Java::JavaNet::URI:0xe845a46>, :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::BadResponseCodeError, :error=>"Got response code '503' contacting Elasticsearch at URL 'http://10.10.0.4:9200/'"}
logstash_logstash.0.96vmfjsm9w9g@node1    | [2017-08-04T08:44:11,389][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://10.10.0.4:9200/, :path=>"/"}
logstash_logstash.0.96vmfjsm9w9g@node1    | [2017-08-04T08:44:11,396][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>#<Java::JavaNet::URI:0xe845a46>}
logstash_logstash.0.96vmfjsm9w9g@node1    | [2017-08-04T09:19:21,735][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://10.10.0.4:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://10.10.0.4:9200/, :error_message=>"Elasticsearch Unreachable: [http://10.10.0.4:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
logstash_logstash.0.96vmfjsm9w9g@node1    | [2017-08-04T09:19:21,735][WARN ][logstash.outputs.elasticsearch] Error while performing sniffing {:error_message=>"Elasticsearch Unreachable: [http://10.10.0.4:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :backtrace=>["/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.6-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:271:in `perform_request_to_url'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.6-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:269:in `perform_request_to_url'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.6-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:257:in `perform_request'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.6-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:347:in `with_connection'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.6-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:256:in `perform_request'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.6-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:157:in `check_sniff'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.6-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:150:in `sniff!'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.6-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:139:in `start_sniffer'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.6-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:121:in `until_stopped'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.6-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:137:in `start_sniffer'"]}
logstash_logstash.0.96vmfjsm9w9g@node1    | [2017-08-04T09:19:21,750][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://10.10.0.4:9200/, :path=>"/"}
logstash_logstash.0.96vmfjsm9w9g@node1    | [2017-08-04T09:19:22,196][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://10.10.0.4:9200/, :path=>"/"}
logstash_logstash.0.96vmfjsm9w9g@node1    | [2017-08-04T09:19:22,200][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>#<Java::JavaNet::URI:0xe845a46>}
muresan commented 7 years ago

@realcbb switch to using endpoint_mode: dnsrr because you do not need the swarm loadbalancer for 1 task, then the proper IPs shoud show.

realcbb commented 7 years ago

@muresan Yes, I could use dnsrr mode. But VIP mode in my case is supposed not to be the problem. Right?

muresan commented 7 years ago

@realcbb VIP is the problem, docker adds it to the same interface as the network interface and ES sees it and uses it to advertise an outgoing IP. There's a patch few commends up from @fcrisciani that moves the IP to loopback which solves this problem. Well VIP mode causes the VIP IPs to show up, not sure what causes the rest. Also because it was too long, here's a way to shorten the YAML file using anchors: https://gist.github.com/muresan/c2b21e0e2d5cc68bc1bce43c6e69e957

realcbb commented 7 years ago

Thanks. VIP ips is because of VIP mode indeedly. I mean that, even though discovery.zen.ping.unicast.hosts set with tasks.serviceName, ES still use VIP as the advertised host. So the setting in fact does not matter.

And even though ES use VIP as the advertised host, why the cluster has the error in my case?

muresan commented 7 years ago

@realcbb I see you have debug enabled logger.level: debug maybe you can find more info in the logs (docker service logs <servicename>)

realcbb commented 7 years ago

I removed the es stack, deleted content in es data folder on each nodes, and then redeploy es stack. Everything goes well like before.

ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.10.0.6           28          81   1    0.02    0.06     0.07 mdi       -      es3
10.10.0.2           42          74   4    0.00    0.02     0.02 mdi       *      es1
10.10.0.4           28          81   1    0.00    0.00     0.00 mdi       -      es2

After some minutes, I catched some bad es logs just when I docker stack deploy ./docker-stack-logstash logstash.
These logs migth be too long...

node1:

[2017-08-04T15:10:37,013][DEBUG][o.e.c.s.ClusterService   ] [es1] processing [create-index-template [logstash], cause [api]]: execute
[2017-08-04T15:10:37,029][DEBUG][o.e.i.IndicesService     ] [es1] creating Index [[uQngaeCgT2iu0sWTxdkzPg/UPnWk2rxSxGjZ7dN3DvEOA]], shards [1]/[0] - reason [create index]
[2017-08-04T15:10:37,067][DEBUG][o.e.i.s.IndexStore       ] [es1] [uQngaeCgT2iu0sWTxdkzPg] using index.store.throttle.type [NONE], with index.store.throttle.max_bytes_per_sec [null]
[2017-08-04T15:10:37,173][DEBUG][o.e.i.m.MapperService    ] [es1] [uQngaeCgT2iu0sWTxdkzPg] using dynamic[true]
[2017-08-04T15:10:37,294][WARN ][o.e.d.i.m.TypeParsers    ] field [include_in_all] is deprecated, as [_all] is deprecated, and will be disallowed in 6.0, use [copy_to] instead.
[2017-08-04T15:10:37,381][WARN ][o.e.d.i.m.TypeParsers    ] field [include_in_all] is deprecated, as [_all] is deprecated, and will be disallowed in 6.0, use [copy_to] instead.
[2017-08-04T15:10:37,540][DEBUG][o.e.i.IndicesService     ] [es1] [uQngaeCgT2iu0sWTxdkzPg] closing ... (reason [NO_LONGER_ASSIGNED])
[2017-08-04T15:10:37,540][DEBUG][o.e.i.IndicesService     ] [es1] [uQngaeCgT2iu0sWTxdkzPg/UPnWk2rxSxGjZ7dN3DvEOA] closing index service (reason [NO_LONGER_ASSIGNED][ created for parsing template mapping])
[2017-08-04T15:10:37,540][DEBUG][o.e.i.c.b.BitsetFilterCache] [es1] [uQngaeCgT2iu0sWTxdkzPg] clearing all bitsets because [close]
[2017-08-04T15:10:37,544][DEBUG][o.e.i.c.q.IndexQueryCache] [es1] [uQngaeCgT2iu0sWTxdkzPg] full cache clear, reason [close]
[2017-08-04T15:10:37,545][DEBUG][o.e.i.c.b.BitsetFilterCache] [es1] [uQngaeCgT2iu0sWTxdkzPg] clearing all bitsets because [close]
[2017-08-04T15:10:37,550][DEBUG][o.e.i.IndicesService     ] [es1] [uQngaeCgT2iu0sWTxdkzPg/UPnWk2rxSxGjZ7dN3DvEOA] closed... (reason [NO_LONGER_ASSIGNED][ created for parsing template mapping])
[2017-08-04T15:10:37,550][DEBUG][o.e.c.s.ClusterService   ] [es1] cluster state updated, version [5], source [create-index-template [logstash], cause [api]]
[2017-08-04T15:10:37,550][DEBUG][o.e.c.s.ClusterService   ] [es1] publishing cluster state version [5]
[2017-08-04T15:11:07,556][DEBUG][o.e.d.z.ZenDiscovery     ] [es1] failed to publish cluster state version [5] (not enough nodes acknowledged, min master nodes [2])
[2017-08-04T15:11:07,559][WARN ][o.e.c.s.ClusterService   ] [es1] failing [create-index-template [logstash], cause [api]]: failed to commit cluster state version [5]
org.elasticsearch.discovery.Discovery$FailedToCommitClusterStateException: timed out while waiting for enough masters to ack sent cluster state. [1] left
    at org.elasticsearch.discovery.zen.PublishClusterStateAction$SendingController.waitForCommit(PublishClusterStateAction.java:574) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.PublishClusterStateAction.innerPublish(PublishClusterStateAction.java:202) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.PublishClusterStateAction.publish(PublishClusterStateAction.java:167) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.ZenDiscovery.publish(ZenDiscovery.java:311) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:741) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [elasticsearch-5.5.0.jar:5.5.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2017-08-04T15:11:07,569][DEBUG][o.e.a.a.i.t.p.TransportPutIndexTemplateAction] [es1] failed to put template [logstash]
org.elasticsearch.discovery.Discovery$FailedToCommitClusterStateException: timed out while waiting for enough masters to ack sent cluster state. [1] left
    at org.elasticsearch.discovery.zen.PublishClusterStateAction$SendingController.waitForCommit(PublishClusterStateAction.java:574) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.PublishClusterStateAction.innerPublish(PublishClusterStateAction.java:202) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.PublishClusterStateAction.publish(PublishClusterStateAction.java:167) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.ZenDiscovery.publish(ZenDiscovery.java:311) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:741) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [elasticsearch-5.5.0.jar:5.5.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2017-08-04T15:11:07,569][DEBUG][o.e.a.a.i.t.p.TransportPutIndexTemplateAction] [es1] master could not publish cluster state or stepped down before publishing action [indices:admin/template/put], scheduling a retry
org.elasticsearch.discovery.Discovery$FailedToCommitClusterStateException: timed out while waiting for enough masters to ack sent cluster state. [1] left
    at org.elasticsearch.discovery.zen.PublishClusterStateAction$SendingController.waitForCommit(PublishClusterStateAction.java:574) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.PublishClusterStateAction.innerPublish(PublishClusterStateAction.java:202) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.PublishClusterStateAction.publish(PublishClusterStateAction.java:167) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.ZenDiscovery.publish(ZenDiscovery.java:311) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:741) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [elasticsearch-5.5.0.jar:5.5.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2017-08-04T15:11:07,579][DEBUG][o.e.a.a.i.t.p.TransportPutIndexTemplateAction] [es1] timed out while retrying [indices:admin/template/put] after failure (timeout [30s])
org.elasticsearch.discovery.Discovery$FailedToCommitClusterStateException: timed out while waiting for enough masters to ack sent cluster state. [1] left
    at org.elasticsearch.discovery.zen.PublishClusterStateAction$SendingController.waitForCommit(PublishClusterStateAction.java:574) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.PublishClusterStateAction.innerPublish(PublishClusterStateAction.java:202) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.PublishClusterStateAction.publish(PublishClusterStateAction.java:167) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.ZenDiscovery.publish(ZenDiscovery.java:311) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:741) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [elasticsearch-5.5.0.jar:5.5.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2017-08-04T15:11:07,587][DEBUG][o.e.c.s.ClusterService   ] [es1] processing [create-index-template [logstash], cause [api]]: took [30.5s] done applying updated cluster_state (version: 5, uuid: 44Lg-3eWQw6oN1lF6kfbBQ)
[2017-08-04T15:11:07,588][WARN ][o.e.c.s.ClusterService   ] [es1] cluster state update task [create-index-template [logstash], cause [api]] took [30.5s] above the warn threshold of 30s
[2017-08-04T15:11:07,588][DEBUG][o.e.c.s.ClusterService   ] [es1] processing [zen-disco-failed-to-publish]: execute
[2017-08-04T15:11:07,588][WARN ][o.e.d.z.ZenDiscovery     ] [es1] zen-disco-failed-to-publish, current nodes: nodes:
   {es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}
   {es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}, local, master
   {es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300}

[2017-08-04T15:11:07,589][DEBUG][o.e.c.s.ClusterService   ] [es1] cluster state updated, version [4], source [zen-disco-failed-to-publish]
[2017-08-04T15:11:07,589][DEBUG][o.e.c.s.ClusterService   ] [es1] applying cluster state version 4
[2017-08-04T15:11:07,589][DEBUG][o.e.c.s.ClusterService   ] [es1] set local cluster state to version 4
[2017-08-04T15:11:07,590][DEBUG][o.e.l.LicenseService     ] [es1] previous [{"uid":"4c1ea454-031b-4e1e-b50d-676cfc012f3b","type":"trial","issue_date_in_millis":1501856084707,"expiry_date_in_millis":1504448084707,"max_nodes":1000,"issued_to":"es-cluster","issuer":"elasticsearch","signature":"/////QAAAPCynqArHS76IEhLjg3dxaWbsDzKiSuaTSTaaq/ecm9rpGnvztb1ERevKoo2hnRTeuo074GopHnZNWoR80gyrvZlbXCxzq8YTt+zbs+ld5OxOZU+tz264/0dTZGpm4bAgx4mb7hPeKVPYXZ/WH6t088uGgJh8Y84T376tXpGlIHBpGEoZ/A0gToEBPCBBz5wqs2itiioE8Of+S/U17Iy9J24bgSV1UGq/dAS2vGxtwmDloQ+vq5NTkXKkegGGm5Bb5wbkxsS5nIJq9Y9pdJmFYSE2zmdNz52OZOm0UVf1gW7T8/JptXAkVmEQCbGMkcz7BA=","start_date_in_millis":-1}]
[2017-08-04T15:11:07,593][DEBUG][o.e.l.LicenseService     ] [es1] current [{"uid":"4c1ea454-031b-4e1e-b50d-676cfc012f3b","type":"trial","issue_date_in_millis":1501856084707,"expiry_date_in_millis":1504448084707,"max_nodes":1000,"issued_to":"es-cluster","issuer":"elasticsearch","signature":"/////QAAAPCynqArHS76IEhLjg3dxaWbsDzKiSuaTSTaaq/ecm9rpGnvztb1ERevKoo2hnRTeuo074GopHnZNWoR80gyrvZlbXCxzq8YTt+zbs+ld5OxOZU+tz264/0dTZGpm4bAgx4mb7hPeKVPYXZ/WH6t088uGgJh8Y84T376tXpGlIHBpGEoZ/A0gToEBPCBBz5wqs2itiioE8Of+S/U17Iy9J24bgSV1UGq/dAS2vGxtwmDloQ+vq5NTkXKkegGGm5Bb5wbkxsS5nIJq9Y9pdJmFYSE2zmdNz52OZOm0UVf1gW7T8/JptXAkVmEQCbGMkcz7BA=","start_date_in_millis":-1}]
[2017-08-04T15:11:07,596][DEBUG][o.e.c.s.ClusterService   ] [es1] processing [zen-disco-failed-to-publish]: took [7ms] done applying updated cluster_state (version: 4, uuid: wa16gWAEQ76p_bCu7rdugQ)
[2017-08-04T15:11:07,600][DEBUG][o.e.c.s.ClusterService   ] [es1] processing [create-index-template [logstash], cause [api]]: execute
[2017-08-04T15:11:07,605][DEBUG][o.e.c.s.ClusterService   ] [es1] failing [create-index-template [logstash], cause [api]]: local node is no longer master
[2017-08-04T15:11:07,611][DEBUG][o.e.a.a.i.t.p.TransportPutIndexTemplateAction] [es1] failed to put template [logstash]
org.elasticsearch.cluster.NotMasterException: no longer master. source: [create-index-template [logstash], cause [api]]
[2017-08-04T15:11:07,615][DEBUG][o.e.a.a.i.t.p.TransportPutIndexTemplateAction] [es1] master could not publish cluster state or stepped down before publishing action [indices:admin/template/put], scheduling a retry
org.elasticsearch.cluster.NotMasterException: no longer master. source: [create-index-template [logstash], cause [api]]
[2017-08-04T15:11:07,627][DEBUG][o.e.c.s.ClusterService   ] [es1] processing [create-index-template [logstash], cause [api]]: execute
[2017-08-04T15:11:07,627][DEBUG][o.e.c.s.ClusterService   ] [es1] failing [create-index-template [logstash], cause [api]]: local node is no longer master
[2017-08-04T15:11:07,627][DEBUG][o.e.a.a.i.t.p.TransportPutIndexTemplateAction] [es1] failed to put template [logstash]
org.elasticsearch.cluster.NotMasterException: no longer master. source: [create-index-template [logstash], cause [api]]
[2017-08-04T15:11:07,627][DEBUG][o.e.a.a.i.t.p.TransportPutIndexTemplateAction] [es1] master could not publish cluster state or stepped down before publishing action [indices:admin/template/put], scheduling a retry
org.elasticsearch.cluster.NotMasterException: no longer master. source: [create-index-template [logstash], cause [api]]
[2017-08-04T15:11:08,448][DEBUG][o.e.c.s.ClusterService   ] [es1] processing [master ping (from: {es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300})]: execute
[2017-08-04T15:11:08,448][DEBUG][o.e.c.s.ClusterService   ] [es1] failing [master ping (from: {es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300})]: local node is no longer master
[2017-08-04T15:11:08,513][DEBUG][o.e.c.s.ClusterService   ] [es1] processing [master ping (from: {es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300})]: execute
[2017-08-04T15:11:08,513][DEBUG][o.e.c.s.ClusterService   ] [es1] failing [master ping (from: {es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300})]: local node is no longer master
[2017-08-04T15:11:08,522][DEBUG][o.e.a.a.i.t.p.TransportPutIndexTemplateAction] [es1] timed out while retrying [indices:admin/template/put] after failure (timeout [30s])
org.elasticsearch.cluster.NotMasterException: no longer master. source: [create-index-template [logstash], cause [api]]
[2017-08-04T15:11:09,278][DEBUG][o.e.a.a.i.t.p.TransportPutIndexTemplateAction] [es1] timed out while retrying [indices:admin/template/put] after failure (timeout [30s])
org.elasticsearch.cluster.NotMasterException: no longer master. source: [create-index-template [logstash], cause [api]]
[2017-08-04T15:11:10,606][DEBUG][o.e.d.z.ZenDiscovery     ] [es1] filtered ping responses: (ignore_non_masters [false])
    --> ping_response{node [{es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300}], id[29], master [null],cluster_state_version [4], cluster_name[es-cluster]}
    --> ping_response{node [{es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}], id[29], master [null],cluster_state_version [4], cluster_name[es-cluster]}
    --> ping_response{node [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}], id[36], master [null],cluster_state_version [4], cluster_name[es-cluster]}
[2017-08-04T15:11:10,609][DEBUG][o.e.d.z.ZenDiscovery     ] [es1] elected as master, waiting for incoming joins ([1] needed)
[2017-08-04T15:11:11,513][DEBUG][o.e.c.s.ClusterService   ] [es1] processing [zen-disco-elected-as-master ([1] nodes joined)[{es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}]]: execute
[2017-08-04T15:11:11,513][DEBUG][o.e.d.z.NodeJoinController] [es1] received a join request for an existing node [{es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}]
[2017-08-04T15:11:11,513][DEBUG][o.e.c.s.ClusterService   ] [es1] cluster state updated, version [5], source [zen-disco-elected-as-master ([1] nodes joined)[{es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}]]
[2017-08-04T15:11:11,513][INFO ][o.e.c.s.ClusterService   ] [es1] new_master {es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}, reason: zen-disco-elected-as-master ([1] nodes joined)[{es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}]
[2017-08-04T15:11:11,513][DEBUG][o.e.c.s.ClusterService   ] [es1] publishing cluster state version [5]
[2017-08-04T15:11:29,277][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [es1] no known master node, scheduling a retry
[2017-08-04T15:11:41,515][DEBUG][o.e.d.z.ZenDiscovery     ] [es1] failed to publish cluster state version [5] (not enough nodes acknowledged, min master nodes [2])
[2017-08-04T15:11:41,515][WARN ][o.e.c.s.ClusterService   ] [es1] failing [zen-disco-elected-as-master ([1] nodes joined)[{es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}]]: failed to commit cluster state version [5]
org.elasticsearch.discovery.Discovery$FailedToCommitClusterStateException: timed out while waiting for enough masters to ack sent cluster state. [1] left
    at org.elasticsearch.discovery.zen.PublishClusterStateAction$SendingController.waitForCommit(PublishClusterStateAction.java:574) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.PublishClusterStateAction.innerPublish(PublishClusterStateAction.java:202) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.PublishClusterStateAction.publish(PublishClusterStateAction.java:167) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.ZenDiscovery.publish(ZenDiscovery.java:311) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:741) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [elasticsearch-5.5.0.jar:5.5.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2017-08-04T15:11:41,524][DEBUG][o.e.c.s.ClusterService   ] [es1] processing [zen-disco-elected-as-master ([1] nodes joined)[{es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}]]: took [30s] done applying updated cluster_state (version: 5, uuid: 31UI45IzQnGX2ypMqCFchw)
[2017-08-04T15:11:41,526][WARN ][o.e.c.s.ClusterService   ] [es1] cluster state update task [zen-disco-elected-as-master ([1] nodes joined)[{es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}]] took [30s] above the warn threshold of 30s
[2017-08-04T15:11:41,527][DEBUG][o.e.c.s.ClusterService   ] [es1] processing [zen-disco-failed-to-publish]: execute
[2017-08-04T15:11:41,527][WARN ][o.e.d.z.ZenDiscovery     ] [es1] zen-disco-failed-to-publish, current nodes: nodes:
   {es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}
   {es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}, local
   {es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300}

[2017-08-04T15:11:41,528][DEBUG][o.e.c.s.ClusterService   ] [es1] processing [zen-disco-failed-to-publish]: took [0s] no change in cluster_state
[2017-08-04T15:11:41,528][DEBUG][o.e.c.s.ClusterService   ] [es1] processing [zen-disco-node-join[{es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300}]]: execute
[2017-08-04T15:11:41,545][DEBUG][o.e.c.s.ClusterService   ] [es1] processing [zen-disco-node-join[{es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300}]]: took [16ms] no change in cluster_state
[2017-08-04T15:11:44,517][DEBUG][o.e.d.z.ZenDiscovery     ] [es1] filtered ping responses: (ignore_non_masters [false])
    --> ping_response{node [{es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300}], id[41], master [null],cluster_state_version [4], cluster_name[es-cluster]}
    --> ping_response{node [{es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}], id[44], master [null],cluster_state_version [4], cluster_name[es-cluster]}
    --> ping_response{node [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}], id[49], master [null],cluster_state_version [4], cluster_name[es-cluster]}
[2017-08-04T15:11:44,518][DEBUG][o.e.d.z.ZenDiscovery     ] [es1] elected as master, waiting for incoming joins ([1] needed)
[2017-08-04T15:11:44,519][DEBUG][o.e.c.s.ClusterService   ] [es1] processing [zen-disco-elected-as-master ([1] nodes joined)[{es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300}]]: execute
[2017-08-04T15:11:44,520][DEBUG][o.e.d.z.NodeJoinController] [es1] received a join request for an existing node [{es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300}]
[2017-08-04T15:11:44,522][DEBUG][o.e.c.s.ClusterService   ] [es1] cluster state updated, version [5], source [zen-disco-elected-as-master ([1] nodes joined)[{es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300}]]
[2017-08-04T15:11:44,523][INFO ][o.e.c.s.ClusterService   ] [es1] new_master {es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}, reason: zen-disco-elected-as-master ([1] nodes joined)[{es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300}]
[2017-08-04T15:11:44,523][DEBUG][o.e.c.s.ClusterService   ] [es1] publishing cluster state version [5]
[2017-08-04T15:11:53,984][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [es1] no known master node, scheduling a retry

node2:

[2017-08-04T15:11:07,510][WARN ][r.suppressed             ] path: /_template/logstash, params: {name=logstash}
org.elasticsearch.transport.RemoteTransportException: [es1][10.10.0.3:9300][indices:admin/template/put]
Caused by: org.elasticsearch.discovery.MasterNotDiscoveredException: FailedToCommitClusterStateException[timed out while waiting for enough masters to ack sent cluster state. [1] left]
    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:209) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:311) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:139) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:111) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.retry(TransportMasterNodeAction.java:194) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.access$500(TransportMasterNodeAction.java:107) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$1.onFailure(TransportMasterNodeAction.java:157) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.action.admin.indices.template.put.TransportPutIndexTemplateAction$1.onFailure(TransportPutIndexTemplateAction.java:101) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.metadata.MetaDataIndexTemplateService$2.onFailure(MetaDataIndexTemplateService.java:163) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService$SafeClusterStateTaskListener.onFailure(ClusterService.java:952) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService$TaskOutputs.lambda$publishingFailed$0(ClusterService.java:865) ~[elasticsearch-5.5.0.jar:5.5.0]
    at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_131]
    at org.elasticsearch.cluster.service.ClusterService$TaskOutputs.publishingFailed(ClusterService.java:865) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:751) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) ~[elasticsearch-5.5.0.jar:5.5.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.elasticsearch.discovery.Discovery$FailedToCommitClusterStateException: timed out while waiting for enough masters to ack sent cluster state. [1] left
    at org.elasticsearch.discovery.zen.PublishClusterStateAction$SendingController.waitForCommit(PublishClusterStateAction.java:574) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.PublishClusterStateAction.innerPublish(PublishClusterStateAction.java:202) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.PublishClusterStateAction.publish(PublishClusterStateAction.java:167) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.discovery.zen.ZenDiscovery.publish(ZenDiscovery.java:311) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:741) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) ~[elasticsearch-5.5.0.jar:5.5.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]
[2017-08-04T15:11:08,433][DEBUG][o.e.d.z.MasterFaultDetection] [es2] [master] pinging a master {es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300} that is no longer a master
[2017-08-04T15:11:08,435][INFO ][o.e.d.z.ZenDiscovery     ] [es2] master_left [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}], reason [no longer master]
org.elasticsearch.transport.RemoteTransportException: [es1][10.10.0.3:9300][internal:discovery/zen/fd/master_ping]
Caused by: org.elasticsearch.cluster.NotMasterException: local node is not master
[2017-08-04T15:11:08,441][DEBUG][o.e.c.s.ClusterService   ] [es2] processing [master_failed ({es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300})]: execute
[2017-08-04T15:11:08,442][WARN ][o.e.d.z.ZenDiscovery     ] [es2] master left (reason = no longer master), current nodes: nodes:
   {es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}, master
   {es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}
   {es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300}, local

[2017-08-04T15:11:08,442][DEBUG][o.e.d.z.MasterFaultDetection] [es2] [master] stopping fault detection against master [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}], reason [master failure, no longer master]
[2017-08-04T15:11:08,443][DEBUG][o.e.c.s.ClusterService   ] [es2] cluster state updated, version [4], source [master_failed ({es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300})]
[2017-08-04T15:11:08,448][DEBUG][o.e.c.s.ClusterService   ] [es2] applying cluster state version 4
[2017-08-04T15:11:08,460][DEBUG][o.e.c.s.ClusterService   ] [es2] set local cluster state to version 4
[2017-08-04T15:11:08,464][DEBUG][o.e.l.LicenseService     ] [es2] previous [{"uid":"4c1ea454-031b-4e1e-b50d-676cfc012f3b","type":"trial","issue_date_in_millis":1501856084707,"expiry_date_in_millis":1504448084707,"max_nodes":1000,"issued_to":"es-cluster","issuer":"elasticsearch","signature":"/////QAAAPCynqArHS76IEhLjg3dxaWbsDzKiSuaTSTaaq/ecm9rpGnvztb1ERevKoo2hnRTeuo074GopHnZNWoR80gyrvZlbXCxzq8YTt+zbs+ld5OxOZU+tz264/0dTZGpm4bAgx4mb7hPeKVPYXZ/WH6t088uGgJh8Y84T376tXpGlIHBpGEoZ/A0gToEBPCBBz5wqs2itiioE8Of+S/U17Iy9J24bgSV1UGq/dAS2vGxtwmDloQ+vq5NTkXKkegGGm5Bb5wbkxsS5nIJq9Y9pdJmFYSE2zmdNz52OZOm0UVf1gW7T8/JptXAkVmEQCbGMkcz7BA=","start_date_in_millis":-1}]
[2017-08-04T15:11:08,468][DEBUG][o.e.l.LicenseService     ] [es2] current [{"uid":"4c1ea454-031b-4e1e-b50d-676cfc012f3b","type":"trial","issue_date_in_millis":1501856084707,"expiry_date_in_millis":1504448084707,"max_nodes":1000,"issued_to":"es-cluster","issuer":"elasticsearch","signature":"/////QAAAPCynqArHS76IEhLjg3dxaWbsDzKiSuaTSTaaq/ecm9rpGnvztb1ERevKoo2hnRTeuo074GopHnZNWoR80gyrvZlbXCxzq8YTt+zbs+ld5OxOZU+tz264/0dTZGpm4bAgx4mb7hPeKVPYXZ/WH6t088uGgJh8Y84T376tXpGlIHBpGEoZ/A0gToEBPCBBz5wqs2itiioE8Of+S/U17Iy9J24bgSV1UGq/dAS2vGxtwmDloQ+vq5NTkXKkegGGm5Bb5wbkxsS5nIJq9Y9pdJmFYSE2zmdNz52OZOm0UVf1gW7T8/JptXAkVmEQCbGMkcz7BA=","start_date_in_millis":-1}]
[2017-08-04T15:11:08,468][DEBUG][o.e.c.s.ClusterService   ] [es2] processing [master_failed ({es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300})]: took [27ms] done applying updated cluster_state (version: 4, uuid: wa16gWAEQ76p_bCu7rdugQ)
[2017-08-04T15:11:09,194][WARN ][r.suppressed             ] path: /_template/logstash, params: {name=logstash}
org.elasticsearch.transport.RemoteTransportException: [es1][10.10.0.3:9300][indices:admin/template/put]
Caused by: org.elasticsearch.discovery.MasterNotDiscoveredException: NotMasterException[no longer master. source: [create-index-template [logstash], cause [api]]]
    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:209) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:311) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:238) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:1056) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) ~[elasticsearch-5.5.0.jar:5.5.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.elasticsearch.cluster.NotMasterException: no longer master. source: [create-index-template [logstash], cause [api]]
[2017-08-04T15:11:11,469][DEBUG][o.e.d.z.ZenDiscovery     ] [es2] filtered ping responses: (ignore_non_masters [false])
    --> ping_response{node [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}], id[35], master [null],cluster_state_version [4], cluster_name[es-cluster]}
    --> ping_response{node [{es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}], id[32], master [null],cluster_state_version [4], cluster_name[es-cluster]}
    --> ping_response{node [{es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300}], id[31], master [null],cluster_state_version [4], cluster_name[es-cluster]}
[2017-08-04T15:11:11,475][DEBUG][o.e.c.s.ClusterService   ] [es2] processing [zen-disco-election-stop [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300} elected]]: execute
[2017-08-04T15:11:11,476][DEBUG][o.e.c.s.ClusterService   ] [es2] processing [zen-disco-election-stop [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300} elected]]: took [0s] no change in cluster_state

node3:

[2017-08-04T15:11:08,426][DEBUG][o.e.d.z.MasterFaultDetection] [es3] [master] pinging a master {es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300} that is no longer a master
[2017-08-04T15:11:08,427][DEBUG][o.e.d.z.MasterFaultDetection] [es3] [master] stopping fault detection against master [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}], reason [master failure, no longer master]
[2017-08-04T15:11:08,435][INFO ][o.e.d.z.ZenDiscovery     ] [es3] master_left [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}], reason [no longer master]
org.elasticsearch.transport.RemoteTransportException: [es1][10.10.0.3:9300][internal:discovery/zen/fd/master_ping]
Caused by: org.elasticsearch.cluster.NotMasterException: local node is not master
[2017-08-04T15:11:08,447][DEBUG][o.e.c.s.ClusterService   ] [es3] processing [master_failed ({es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300})]: execute
[2017-08-04T15:11:08,450][WARN ][o.e.d.z.ZenDiscovery     ] [es3] master left (reason = no longer master), current nodes: nodes:
   {es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300}
   {es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}, master
   {es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}, local

[2017-08-04T15:11:08,457][DEBUG][o.e.c.s.ClusterService   ] [es3] cluster state updated, version [4], source [master_failed ({es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300})]
[2017-08-04T15:11:08,462][DEBUG][o.e.c.s.ClusterService   ] [es3] applying cluster state version 4
[2017-08-04T15:11:08,464][DEBUG][o.e.c.s.ClusterService   ] [es3] set local cluster state to version 4
[2017-08-04T15:11:08,469][DEBUG][o.e.l.LicenseService     ] [es3] previous [{"uid":"4c1ea454-031b-4e1e-b50d-676cfc012f3b","type":"trial","issue_date_in_millis":1501856084707,"expiry_date_in_millis":1504448084707,"max_nodes":1000,"issued_to":"es-cluster","issuer":"elasticsearch","signature":"/////QAAAPCynqArHS76IEhLjg3dxaWbsDzKiSuaTSTaaq/ecm9rpGnvztb1ERevKoo2hnRTeuo074GopHnZNWoR80gyrvZlbXCxzq8YTt+zbs+ld5OxOZU+tz264/0dTZGpm4bAgx4mb7hPeKVPYXZ/WH6t088uGgJh8Y84T376tXpGlIHBpGEoZ/A0gToEBPCBBz5wqs2itiioE8Of+S/U17Iy9J24bgSV1UGq/dAS2vGxtwmDloQ+vq5NTkXKkegGGm5Bb5wbkxsS5nIJq9Y9pdJmFYSE2zmdNz52OZOm0UVf1gW7T8/JptXAkVmEQCbGMkcz7BA=","start_date_in_millis":-1}]
[2017-08-04T15:11:08,477][DEBUG][o.e.l.LicenseService     ] [es3] current [{"uid":"4c1ea454-031b-4e1e-b50d-676cfc012f3b","type":"trial","issue_date_in_millis":1501856084707,"expiry_date_in_millis":1504448084707,"max_nodes":1000,"issued_to":"es-cluster","issuer":"elasticsearch","signature":"/////QAAAPCynqArHS76IEhLjg3dxaWbsDzKiSuaTSTaaq/ecm9rpGnvztb1ERevKoo2hnRTeuo074GopHnZNWoR80gyrvZlbXCxzq8YTt+zbs+ld5OxOZU+tz264/0dTZGpm4bAgx4mb7hPeKVPYXZ/WH6t088uGgJh8Y84T376tXpGlIHBpGEoZ/A0gToEBPCBBz5wqs2itiioE8Of+S/U17Iy9J24bgSV1UGq/dAS2vGxtwmDloQ+vq5NTkXKkegGGm5Bb5wbkxsS5nIJq9Y9pdJmFYSE2zmdNz52OZOm0UVf1gW7T8/JptXAkVmEQCbGMkcz7BA=","start_date_in_millis":-1}]
[2017-08-04T15:11:08,481][DEBUG][o.e.c.s.ClusterService   ] [es3] processing [master_failed ({es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300})]: took [30ms] done applying updated cluster_state (version: 4, uuid: wa16gWAEQ76p_bCu7rdugQ)
[2017-08-04T15:11:08,503][WARN ][r.suppressed             ] path: /_template/logstash, params: {name=logstash}
org.elasticsearch.transport.RemoteTransportException: [es1][10.10.0.3:9300][indices:admin/template/put]
Caused by: org.elasticsearch.discovery.MasterNotDiscoveredException: NotMasterException[no longer master. source: [create-index-template [logstash], cause [api]]]
    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:209) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:311) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:238) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:1056) ~[elasticsearch-5.5.0.jar:5.5.0]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) ~[elasticsearch-5.5.0.jar:5.5.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.elasticsearch.cluster.NotMasterException: no longer master. source: [create-index-template [logstash], cause [api]]
[2017-08-04T15:11:11,478][DEBUG][o.e.d.z.ZenDiscovery     ] [es3] filtered ping responses: (ignore_non_masters [false])
    --> ping_response{node [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}], id[33], master [null],cluster_state_version [4], cluster_name[es-cluster]}
    --> ping_response{node [{es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300}], id[29], master [null],cluster_state_version [4], cluster_name[es-cluster]}
    --> ping_response{node [{es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}], id[33], master [null],cluster_state_version [4], cluster_name[es-cluster]}
[2017-08-04T15:11:11,479][DEBUG][o.e.c.s.ClusterService   ] [es3] processing [zen-disco-election-stop [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300} elected]]: execute
[2017-08-04T15:11:11,479][DEBUG][o.e.c.s.ClusterService   ] [es3] processing [zen-disco-election-stop [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300} elected]]: took [0s] no change in cluster_state
[2017-08-04T15:11:41,494][INFO ][o.e.d.z.ZenDiscovery     ] [es3] failed to send join request to master [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}], reason [RemoteTransportException[[es1][10.10.0.3:9300][internal:discovery/zen/join]]; nested: FailedToCommitClusterStateException[timed out while waiting for enough masters to ack sent cluster state. [1] left]; ]
[2017-08-04T15:11:41,496][DEBUG][o.e.c.s.ClusterService   ] [es3] processing [finalize_join ({es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300})]: execute
[2017-08-04T15:11:41,496][DEBUG][o.e.c.s.ClusterService   ] [es3] processing [finalize_join ({es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300})]: took [0s] no change in cluster_state
[2017-08-04T15:11:44,497][DEBUG][o.e.d.z.ZenDiscovery     ] [es3] filtered ping responses: (ignore_non_masters [false])
    --> ping_response{node [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300}], id[48], master [null],cluster_state_version [4], cluster_name[es-cluster]}
    --> ping_response{node [{es2}{98BIV3ZuQOeOtfVj_2KJcQ}{TFiTKrACQLKJUNKnW1prCQ}{10.10.0.4}{10.10.0.4:9300}], id[43], master [null],cluster_state_version [4], cluster_name[es-cluster]}
    --> ping_response{node [{es3}{eeeJqvs-SYalWKLi59ltPQ}{JEKCM_ecREuvypXt4vmHEQ}{10.10.0.6}{10.10.0.6:9300}], id[46], master [null],cluster_state_version [4], cluster_name[es-cluster]}
[2017-08-04T15:11:44,499][DEBUG][o.e.c.s.ClusterService   ] [es3] processing [zen-disco-election-stop [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300} elected]]: execute
[2017-08-04T15:11:44,501][DEBUG][o.e.c.s.ClusterService   ] [es3] processing [zen-disco-election-stop [{es1}{5xCwug_7QduQ15lhb9Gmhw}{zxhvNLd4QpW6nPaw61eUGg}{10.10.0.2}{10.10.0.2:9300} elected]]: took [0s] no change in cluster_state
IvanBiv commented 7 years ago

Hi everybody! I can't deploy elasticsearch cluster. Tools: Docker version 17.06.1-ce, build 874a737

version: '3.3'  
services:  
  elasticsearch:
    image: elasticsearch:alpine
    ports:
      - '9200:9200'
      - '9300:9300'
    command: [ elasticsearch, -E, network.host=0.0.0.0, -E, discovery.zen.ping.unicast.hosts=elasticsearch, -E, discovery.zen.minimum_master_nodes=1, -E, cluster.name=mycluster ]
    networks:
      - esnet1
    environment:
        ES_JAVA_OPTS: "-Xmx512m -Xms512m"
    deploy:
      mode: replicated
      replicas: 2
      #endpoint_mode: dnsrr
      resources:
        limits:
          cpus: '2'
          memory: 1024M
        reservations:
          cpus: '0.50'
          memory: 512M

networks:
  esnet1:

screenshot from 2017-08-19 00-53-18

Log service:

[2017-08-18T21:51:41,343][INFO ][o.e.n.Node               ] [] initializing ...
[2017-08-18T21:51:41,448][INFO ][o.e.e.NodeEnvironment    ] [KBs18kt] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/mapper/server2--vg-root)]], net usable_space [1.6tb], net total_space [1.7tb], spins? [possibly], types [ext4]
[2017-08-18T21:51:41,448][INFO ][o.e.e.NodeEnvironment    ] [KBs18kt] heap size [494.9mb], compressed ordinary object pointers [true]
[2017-08-18T21:51:41,449][INFO ][o.e.n.Node               ] node name [KBs18kt] derived from node ID [KBs18ktkTqCla61SlehVPA]; set [node.name] to override
[2017-08-18T21:51:41,449][INFO ][o.e.n.Node               ] version[5.5.1], pid[1], build[19c13d0/2017-07-18T20:44:24.823Z], OS[Linux/4.4.0-91-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_131/25.131-b11]
[2017-08-18T21:51:41,450][INFO ][o.e.n.Node               ] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Xmx512m, -Xms512m, -Des.path.home=/usr/share/elasticsearch]
[2017-08-18T21:51:42,749][INFO ][o.e.p.PluginsService     ] [KBs18kt] loaded module [aggs-matrix-stats]
[2017-08-18T21:51:42,749][INFO ][o.e.p.PluginsService     ] [KBs18kt] loaded module [ingest-common]
[2017-08-18T21:51:42,749][INFO ][o.e.p.PluginsService     ] [KBs18kt] loaded module [lang-expression]
[2017-08-18T21:51:42,749][INFO ][o.e.p.PluginsService     ] [KBs18kt] loaded module [lang-groovy]
[2017-08-18T21:51:42,749][INFO ][o.e.p.PluginsService     ] [KBs18kt] loaded module [lang-mustache]
[2017-08-18T21:51:42,749][INFO ][o.e.p.PluginsService     ] [KBs18kt] loaded module [lang-painless]
[2017-08-18T21:51:42,749][INFO ][o.e.p.PluginsService     ] [KBs18kt] loaded module [parent-join]
[2017-08-18T21:51:42,749][INFO ][o.e.p.PluginsService     ] [KBs18kt] loaded module [percolator]
[2017-08-18T21:51:42,749][INFO ][o.e.p.PluginsService     ] [KBs18kt] loaded module [reindex]
[2017-08-18T21:51:42,750][INFO ][o.e.p.PluginsService     ] [KBs18kt] loaded module [transport-netty3]
[2017-08-18T21:51:42,750][INFO ][o.e.p.PluginsService     ] [KBs18kt] loaded module [transport-netty4]
[2017-08-18T21:51:42,750][INFO ][o.e.p.PluginsService     ] [KBs18kt] no plugins loaded
[2017-08-18T21:51:44,869][INFO ][o.e.d.DiscoveryModule    ] [KBs18kt] using discovery type [zen]
[2017-08-18T21:51:45,900][INFO ][o.e.n.Node               ] initialized
[2017-08-18T21:51:45,901][INFO ][o.e.n.Node               ] [KBs18kt] starting ...
[2017-08-18T21:51:46,013][INFO ][o.e.t.TransportService   ] [KBs18kt] publish_address {10.0.4.2:9300}, bound_addresses {0.0.0.0:9300}
[2017-08-18T21:51:46,020][INFO ][o.e.b.BootstrapChecks    ] [KBs18kt] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-08-18T21:51:49,056][INFO ][o.e.c.s.ClusterService   ] [KBs18kt] new_master {KBs18kt}{KBs18ktkTqCla61SlehVPA}{7wr2u2qjT8i1pyugtDmviA}{10.0.4.2}{10.0.4.2:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-08-18T21:51:49,135][INFO ][o.e.h.n.Netty4HttpServerTransport] [KBs18kt] publish_address {10.0.4.2:9200}, bound_addresses {0.0.0.0:9200}
[2017-08-18T21:51:49,135][INFO ][o.e.n.Node               ] [KBs18kt] started
[2017-08-18T21:51:49,188][INFO ][o.e.g.GatewayService     ] [KBs18kt] recovered [0] indices into cluster_state
[2017-08-18T21:51:40,759][INFO ][o.e.n.Node               ] [] initializing ...
[2017-08-18T21:51:40,862][INFO ][o.e.e.NodeEnvironment    ] [xKEFl_q] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/sda1)]], net usable_space [78.8gb], net total_space [101.5gb], spins? [possibly], types [ext4]
[2017-08-18T21:51:40,863][INFO ][o.e.e.NodeEnvironment    ] [xKEFl_q] heap size [494.9mb], compressed ordinary object pointers [true]
[2017-08-18T21:51:40,865][INFO ][o.e.n.Node               ] node name [xKEFl_q] derived from node ID [xKEFl_q-Q7a4IKiF2NrXJw]; set [node.name] to override
[2017-08-18T21:51:40,865][INFO ][o.e.n.Node               ] version[5.5.1], pid[1], build[19c13d0/2017-07-18T20:44:24.823Z], OS[Linux/4.8.0-53-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_131/25.131-b11]
[2017-08-18T21:51:40,866][INFO ][o.e.n.Node               ] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Xmx512m, -Xms512m, -Des.path.home=/usr/share/elasticsearch]
[2017-08-18T21:51:42,763][INFO ][o.e.p.PluginsService     ] [xKEFl_q] loaded module [aggs-matrix-stats]
[2017-08-18T21:51:42,763][INFO ][o.e.p.PluginsService     ] [xKEFl_q] loaded module [ingest-common]
[2017-08-18T21:51:42,763][INFO ][o.e.p.PluginsService     ] [xKEFl_q] loaded module [lang-expression]
[2017-08-18T21:51:42,763][INFO ][o.e.p.PluginsService     ] [xKEFl_q] loaded module [lang-groovy]
[2017-08-18T21:51:42,763][INFO ][o.e.p.PluginsService     ] [xKEFl_q] loaded module [lang-mustache]
[2017-08-18T21:51:42,763][INFO ][o.e.p.PluginsService     ] [xKEFl_q] loaded module [lang-painless]
[2017-08-18T21:51:42,763][INFO ][o.e.p.PluginsService     ] [xKEFl_q] loaded module [parent-join]
[2017-08-18T21:51:42,764][INFO ][o.e.p.PluginsService     ] [xKEFl_q] loaded module [percolator]
[2017-08-18T21:51:42,764][INFO ][o.e.p.PluginsService     ] [xKEFl_q] loaded module [reindex]
[2017-08-18T21:51:42,764][INFO ][o.e.p.PluginsService     ] [xKEFl_q] loaded module [transport-netty3]
[2017-08-18T21:51:42,764][INFO ][o.e.p.PluginsService     ] [xKEFl_q] loaded module [transport-netty4]
[2017-08-18T21:51:42,764][INFO ][o.e.p.PluginsService     ] [xKEFl_q] no plugins loaded
[2017-08-18T21:51:45,470][INFO ][o.e.d.DiscoveryModule    ] [xKEFl_q] using discovery type [zen]
[2017-08-18T21:51:46,902][INFO ][o.e.n.Node               ] initialized
[2017-08-18T21:51:46,902][INFO ][o.e.n.Node               ] [xKEFl_q] starting ...
[2017-08-18T21:51:47,060][INFO ][o.e.t.TransportService   ] [xKEFl_q] publish_address {10.0.4.2:9300}, bound_addresses {0.0.0.0:9300}
[2017-08-18T21:51:47,072][INFO ][o.e.b.BootstrapChecks    ] [xKEFl_q] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-08-18T21:51:50,133][INFO ][o.e.c.s.ClusterService   ] [xKEFl_q] new_master {xKEFl_q}{xKEFl_q-Q7a4IKiF2NrXJw}{CnefaO9nTFaGjpg9EzI5xA}{10.0.4.2}{10.0.4.2:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-08-18T21:51:50,153][INFO ][o.e.h.n.Netty4HttpServerTransport] [xKEFl_q] publish_address {10.0.4.2:9200}, bound_addresses {0.0.0.0:9200}
[2017-08-18T21:51:50,154][INFO ][o.e.n.Node               ] [xKEFl_q] started
[2017-08-18T21:51:50,163][INFO ][o.e.g.GatewayService     ] [xKEFl_q] recovered [0] indices into cluster_state

screenshot from 2017-08-19 01-01-07

Two instances of elasticsearch do not see each other. What am I doing wrong?

fcrisciani commented 7 years ago

@IvanBiv the VIP on loopback fix is not in the 17.06 train. Being a new feature will be in the next release

IvanBiv commented 7 years ago

@fcrisciani I did not find a discussion of this problem at https://github.com/docker/docker-ce How soon to wait for the release with the addition of this feature?

erdarun commented 7 years ago

@muresan, @fcrisciani I have implemented @muresan dsnrr mode by referencing your post.

But the problem I have is my requirement is for "discovery.zen.minimum_master_nodes=2"

Since I have defined minimum master nodes as 2 even after 2 replica node started they can't identify each other. ZenDiscovery(org.elasticsearch.discovery.MasterNotDiscoveredException) on each node frequently tries for 2nd master ping and fails. May be due to network issues?

After adding below network host both nodes joins cluster network.host: 0.0.0.0 Is it the right way of doing it?

Even then I have issues like, For all existing indexes - DanglingIndicesState warning can not be imported as a dangling index, as index with same name already exists in cluster metadata

Please give your suggestion @muresan, @fcrisciani

muresan commented 7 years ago

@erdarun yes you need network.host: 0.0.0.0 or anything similar because default is network.host: _local_ which binds to loopback. DanglingIndicesState - I'm assuming this is from the fact that volumes are present from previous stacks being created/deleted. You should not see that after a clean deploy (remove all volumes from all swarm nodes, unless you can ensure that the same container will always get the same volume.)

fcrisciani commented 7 years ago

@IvanBiv most likely will be 17.09 train that will arrive in September, will confirm as I have more details

IvanBiv commented 7 years ago

@fcrisciani thank you it is good news! If you will have news please write here, easy way for ES cluster it is very warently!

fcrisciani commented 7 years ago

@IvanBiv just wanted to confirm that docker 17.09.0-ce-rc1 available on the testing channel contains already the fix and allow the deploy of the elasticsearch cluster as a docker swarm service. I tested again the example here: https://github.com/elastic/elasticsearch-docker/issues/91#issuecomment-319698631 If you try the same example use: sudo docker stack deploy -c compose.yml dev the name has to be dev to match the tasks.dev_elasticsearch

IvanBiv commented 7 years ago

@fcrisciani, Thanks! It is work! image

shawnpanda commented 7 years ago

@fcrisciani Thanks a lot for the update. The example seems to work for me locally but once on the swarm mode, my elasticsearch service would have the error "invalid mount config for type "bind": bind source path does not exist".

I am curious whether swarm mode works for you, @IvanBiv ?

IvanBiv commented 7 years ago

@shawnpanda I think you need update docker on hosts. I have ES as distributed cluster. image

image

image

IvanBiv commented 7 years ago

@fcrisciani If I use other stack name it don't work. Work:

version: "3.3"

services:
  elasticsearch:
    image: elasticsearch:alpine
    command: [ elasticsearch, -E, "network.host=_eth0:ipv4_", -E, discovery.zen.ping.unicast.hosts=tasks.dev_elasticsearch, -E, discovery.zen.minimum_master_nodes=2, -E, cluster.name=myclustersss ]
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    #volumes:
    #  - ./elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    networks:
      - backend
    deploy:
      replicas: 3
  kibana:
    image: kibana
    ports:
      - "5601:5601"
    networks:
      - backend

networks:
  backend:
    attachable: true

sudo docker stack deploy -c compose.yml dev

Don't work:

version: "3.3"

services:
  elasticsearch:
    image: elasticsearch:alpine
    command: [ elasticsearch, -E, "network.host=_eth0:ipv4_", -E, discovery.zen.ping.unicast.hosts=tasks.sss_elasticsearch, -E, discovery.zen.minimum_master_nodes=2, -E, cluster.name=myclustersss ]
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    #volumes:
    #  - ./elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    networks:
      - backend
    deploy:
      replicas: 3
  kibana:
    image: kibana
    ports:
      - "5601:5601"
    networks:
      - backend

networks:
  backend:
    attachable: true

sudo docker stack deploy -c compose.yml sss

fcrisciani commented 7 years ago

@shawnpanda most likely the problem that you are hitting is because you need the elasticsearch.yml configuration on all the nodes, as you can see in the compose file for convenience I'm mounting that as a volume so the task to be spawn correctly needs that file to be present. The other option is to do like @IvanBiv that passes the config as arguments in the launch command.

@IvanBiv I replicate the same test and the reason it is not working is that the overlay interface is not anymore eth0 but eth2. What I did is the following: docker inspect <id container elasticsearch> -f {{.NetworkSettings.Networks.sss_backend}} the output is like:

{0xc42042e580 [] [4c72cefee3a3] 47w50i4m9m0kd3m1wo2dawwi5 8e0ebe6191b6440e2db2caddcbae0af2e56f4bd5bd71c64510071e3bdf56e7a3  10.0.0.3 24   0 02:42:0a:00:00:03 map[]}

You want to select the interface inside the container that has this ip and mac: 10.0.0.3 or 02:42:0a:00:00:03 Today we don't have a real way to select the interface name or to guarantee the interface creation order, maybe this would be next step to make things better.

IvanBiv commented 7 years ago

@fcrisciani Thanks, but I didn't understand can I use other Stack name or only "dev"?

fcrisciani commented 7 years ago

@IvanBiv yes try to change the "network.host=_eth0:ipv4_" to "network.host=_eth2:ipv4_" and verify that eth2 is actually the interface of the overlay network sss_backend. If so that will work.

Interfaces are created from networks and they are ordered in lexicographical order so sss now is the last one :) for this reason from eth0 is now eth2, if you deploy it as aaa would be again eth0

IvanBiv commented 7 years ago

@fcrisciani ok, I unedrstood you. I will try

muresan commented 7 years ago

@fcristiani @IvanBiv you don't need to use discovery.zen.ping.unicast.hosts=tasks.dev_elasticsearch it will work with just tasks.<servicename> no need to add stack name. That keeps the compose file independent of the name of the stack. Now the only problem remaining is the order of the interfaces. Hint: https://github.com/docker/libnetwork/issues/1888 :)

kladiv commented 7 years ago

@fcrisciani Hi, i test your compose (i have changed only something to fit well my needs) and everything works OK. But... if i insert the healthcheck block it stops working:

version: "3.3"
services:
  elasticsearch:
    image: elasticsearch:alpine
    command: [ elasticsearch, -E, "network.host=_eth0:ipv4_", -E, discovery.zen.ping.unicast.hosts=tasks.elasticsearch, -E, discovery.zen.minimum_master_nodes=2, -E, cluster.name=es-cluster ]
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    networks:
      - backend
    healthcheck:
        test: ping -c1 localhost >/dev/null 2>&1 || exit 1
        interval: 1m
        timeout: 10s
        retries: 3
    deploy:
      mode: global

networks:
  backend:
    attachable: true

Container log:

...
dev_elasticsearch.0.m1g998oi5rvo@docker3    | [2017-10-02T22:16:21,176][INFO ][o.e.n.Node               ] initialized
dev_elasticsearch.0.m1g998oi5rvo@docker3    | [2017-10-02T22:16:21,176][INFO ][o.e.n.Node               ] [qJKdehi] starting ...
dev_elasticsearch.0.m1g998oi5rvo@docker3    | [2017-10-02T22:16:21,254][INFO ][o.e.t.TransportService   ] [qJKdehi] publish_address {10.0.0.5:9300}, bound_addresses {10.0.0.5:9300}
dev_elasticsearch.0.m1g998oi5rvo@docker3    | [2017-10-02T22:16:21,260][INFO ][o.e.b.BootstrapChecks    ] [qJKdehi] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
dev_elasticsearch.0.m1g998oi5rvo@docker3    | [2017-10-02T22:16:21,280][WARN ][o.e.d.z.UnicastZenPing   ] [qJKdehi] failed to resolve host [tasks.elasticsearch]
dev_elasticsearch.0.m1g998oi5rvo@docker3    | java.net.UnknownHostException: tasks.elasticsearch: Name does not resolve
dev_elasticsearch.0.m1g998oi5rvo@docker3    |   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[?:1.8.0_131]
dev_elasticsearch.0.m1g998oi5rvo@docker3    |   at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) ~[?:1.8.0_131]
dev_elasticsearch.0.m1g998oi5rvo@docker3    |   at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) ~[?:1.8.0_131]
dev_elasticsearch.0.m1g998oi5rvo@docker3    |   at java.net.InetAddress.getAllByName0(InetAddress.java:1276) ~[?:1.8.0_131]
dev_elasticsearch.0.m1g998oi5rvo@docker3    |   at java.net.InetAddress.getAllByName(InetAddress.java:1192) ~[?:1.8.0_131]
dev_elasticsearch.0.m1g998oi5rvo@docker3    |   at java.net.InetAddress.getAllByName(InetAddress.java:1126) ~[?:1.8.0_131]
dev_elasticsearch.0.m1g998oi5rvo@docker3    |   at org.elasticsearch.transport.TcpTransport.parse(TcpTransport.java:921) ~[elasticsearch-5.5.2.jar:5.5.2]
dev_elasticsearch.0.m1g998oi5rvo@docker3    |   at org.elasticsearch.transport.TcpTransport.addressesFromString(TcpTransport.java:876) ~[elasticsearch-5.5.2.jar:5.5.2]
dev_elasticsearch.0.m1g998oi5rvo@docker3    |   at org.elasticsearch.transport.TransportService.addressesFromString(TransportService.java:691) ~[elasticsearch-5.5.2.jar:5.5.2]
dev_elasticsearch.0.m1g998oi5rvo@docker3    |   at org.elasticsearch.discovery.zen.UnicastZenPing.lambda$null$0(UnicastZenPing.java:212) ~[elasticsearch-5.5.2.jar:5.5.2]
dev_elasticsearch.0.m1g998oi5rvo@docker3    |   at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_131]
dev_elasticsearch.0.m1g998oi5rvo@docker3    |   at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.5.2.jar:5.5.2]
dev_elasticsearch.0.m1g998oi5rvo@docker3    |   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
dev_elasticsearch.0.m1g998oi5rvo@docker3    |   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
dev_elasticsearch.0.m1g998oi5rvo@docker3    |   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
dev_elasticsearch.0.wjr53bkc7plt@docker2    | [2017-10-02T22:16:21,309][INFO ][o.e.n.Node               ] initialized
dev_elasticsearch.0.wjr53bkc7plt@docker2    | [2017-10-02T22:16:21,310][INFO ][o.e.n.Node               ] [c1rMKJf] starting ...
dev_elasticsearch.0.1b8miso3adcm@docker1    | [2017-10-02T22:16:21,385][INFO ][o.e.n.Node               ] initialized
dev_elasticsearch.0.1b8miso3adcm@docker1    | [2017-10-02T22:16:21,385][INFO ][o.e.n.Node               ] [gAPXef-] starting ...
dev_elasticsearch.0.wjr53bkc7plt@docker2    | [2017-10-02T22:16:21,401][INFO ][o.e.t.TransportService   ] [c1rMKJf] publish_address {10.0.0.4:9300}, bound_addresses {10.0.0.4:9300}
dev_elasticsearch.0.wjr53bkc7plt@docker2    | [2017-10-02T22:16:21,407][INFO ][o.e.b.BootstrapChecks    ] [c1rMKJf] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
dev_elasticsearch.0.wjr53bkc7plt@docker2    | [2017-10-02T22:16:21,427][WARN ][o.e.d.z.UnicastZenPing   ] [c1rMKJf] failed to resolve host [tasks.elasticsearch]
dev_elasticsearch.0.wjr53bkc7plt@docker2    | java.net.UnknownHostException: tasks.elasticsearch: Name does not resolve
dev_elasticsearch.0.wjr53bkc7plt@docker2    |   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[?:1.8.0_131]
dev_elasticsearch.0.wjr53bkc7plt@docker2    |   at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) ~[?:1.8.0_131]
dev_elasticsearch.0.wjr53bkc7plt@docker2    |   at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) ~[?:1.8.0_131]
dev_elasticsearch.0.wjr53bkc7plt@docker2    |   at java.net.InetAddress.getAllByName0(InetAddress.java:1276) ~[?:1.8.0_131]
dev_elasticsearch.0.wjr53bkc7plt@docker2    |   at java.net.InetAddress.getAllByName(InetAddress.java:1192) ~[?:1.8.0_131]
dev_elasticsearch.0.wjr53bkc7plt@docker2    |   at java.net.InetAddress.getAllByName(InetAddress.java:1126) ~[?:1.8.0_131]
dev_elasticsearch.0.wjr53bkc7plt@docker2    |   at org.elasticsearch.transport.TcpTransport.parse(TcpTransport.java:921) ~[elasticsearch-5.5.2.jar:5.5.2]
dev_elasticsearch.0.wjr53bkc7plt@docker2    |   at org.elasticsearch.transport.TcpTransport.addressesFromString(TcpTransport.java:876) ~[elasticsearch-5.5.2.jar:5.5.2]
dev_elasticsearch.0.wjr53bkc7plt@docker2    |   at org.elasticsearch.transport.TransportService.addressesFromString(TransportService.java:691) ~[elasticsearch-5.5.2.jar:5.5.2]
dev_elasticsearch.0.wjr53bkc7plt@docker2    |   at org.elasticsearch.discovery.zen.UnicastZenPing.lambda$null$0(UnicastZenPing.java:212) ~[elasticsearch-5.5.2.jar:5.5.2]
dev_elasticsearch.0.wjr53bkc7plt@docker2    |   at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_131]
dev_elasticsearch.0.wjr53bkc7plt@docker2    |   at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.5.2.jar:5.5.2]
dev_elasticsearch.0.wjr53bkc7plt@docker2    |   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
dev_elasticsearch.0.wjr53bkc7plt@docker2    |   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
dev_elasticsearch.0.wjr53bkc7plt@docker2    |   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
...

Without healthcheck block, errors does not appear. Why? It appears similar to: https://forums.docker.com/t/healthcheck-differences-between-docker-compose-and-docker-engine-swarm/29126 , but healthcheck should not fail in this case.

fcrisciani commented 7 years ago

@kladiv can you try to use: discovery.zen.ping.unicast.hosts=tasks.dev_elasticsearch, where dev is the same as docker stack deploy -c compose.yml dev

kladiv commented 7 years ago

@fcrisciani the same kind of error occur:

...
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    | [2017-10-02T23:22:27,915][WARN ][o.e.n.Node               ] [g-8rhE5] timed out while waiting for initial discovery state - timeout: 30s
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    | [2017-10-02T23:22:27,925][INFO ][o.e.h.n.Netty4HttpServerTransport] [g-8rhE5] publish_address {10.0.0.2:9200}, bound_addresses {0.0.0.0:9200}
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    | [2017-10-02T23:22:27,925][INFO ][o.e.n.Node               ] [g-8rhE5] started
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    | [2017-10-02T23:22:27,965][WARN ][o.e.d.z.ZenDiscovery     ] [g-8rhE5] not enough master nodes discovered during pinging (found [[Candidate{node={g-8rhE5}{g-8rhE5SQMmXKv5GmC7pHQ}{xfzLk5EgQCW7Ikm4-3nGdg}{10.0.0.3}{10.0.0.3:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    | [2017-10-02T23:22:27,965][WARN ][o.e.d.z.UnicastZenPing   ] [g-8rhE5] failed to resolve host [tasks.dev_elasticsearch]
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    | java.net.UnknownHostException: tasks.dev_elasticsearch
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    |   at java.net.InetAddress.getAllByName0(InetAddress.java:1280) ~[?:1.8.0_131]
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    |   at java.net.InetAddress.getAllByName(InetAddress.java:1192) ~[?:1.8.0_131]
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    |   at java.net.InetAddress.getAllByName(InetAddress.java:1126) ~[?:1.8.0_131]
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    |   at org.elasticsearch.transport.TcpTransport.parse(TcpTransport.java:921) ~[elasticsearch-5.5.2.jar:5.5.2]
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    |   at org.elasticsearch.transport.TcpTransport.addressesFromString(TcpTransport.java:876) ~[elasticsearch-5.5.2.jar:5.5.2]
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    |   at org.elasticsearch.transport.TransportService.addressesFromString(TransportService.java:691) ~[elasticsearch-5.5.2.jar:5.5.2]
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    |   at org.elasticsearch.discovery.zen.UnicastZenPing.lambda$null$0(UnicastZenPing.java:212) ~[elasticsearch-5.5.2.jar:5.5.2]
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    |   at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_131]
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    |   at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.5.2.jar:5.5.2]
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    |   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    |   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
dev_elasticsearch.0.mcgbp2tz9k7m@docker1    |   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
...
muresan commented 7 years ago

@kladiv I just tested your compose file and it works for me. True, there are DNS resolution errors at the beginning where tasks.elasticsearch is not resolvable but it is resolvable after ~20s or so and then the cluster forms. ES has a networkaddress.cache.negative.ttl=10 setting so it will cache the fact that tasks.elasticsearch doesn't resolve for 10s, combined with the retry it will take some time. But in the end:

vagrant@m01:~$ curl  192.168.124.100:9200/_cat/nodes
10.0.0.3 30 56 0 0.00 0.14 0.26 mdi - wI0sOw1
10.0.0.4 23 55 1 0.08 0.21 0.30 mdi - SupF8Tv
10.0.0.5 29 60 1 0.01 0.19 0.35 mdi - ZA3dlr1
10.0.0.8 32 55 2 0.19 0.23 0.30 mdi * cvslCwk
10.0.0.7 28 63 0 0.13 0.24 0.33 mdi - V5h5Vxq
10.0.0.6 21 60 0 0.00 0.16 0.31 mdi - t4TrjmZ

my test swarm has 3m+3w.

initially:

/usr/share/elasticsearch # ping tasks.elasticsearch
ping: bad address 'tasks.elasticsearch'

but then:

/usr/share/elasticsearch # ping tasks.elasticsearch
PING tasks.elasticsearch (10.0.0.7): 56 data bytes
64 bytes from 10.0.0.7: seq=0 ttl=64 time=0.056 ms
64 bytes from 10.0.0.7: seq=1 ttl=64 time=0.105 ms
^C
--- tasks.elasticsearch ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.056/0.080/0.105 ms
/usr/share/elasticsearch # dig elasticsearch
....
;; QUESTION SECTION:
;tasks.elasticsearch.       IN  A

;; ANSWER SECTION:
tasks.elasticsearch.    600 IN  A   10.0.0.8
tasks.elasticsearch.    600 IN  A   10.0.0.7
tasks.elasticsearch.    600 IN  A   10.0.0.3
tasks.elasticsearch.    600 IN  A   10.0.0.6
tasks.elasticsearch.    600 IN  A   10.0.0.4
tasks.elasticsearch.    600 IN  A   10.0.0.5
kladiv commented 7 years ago

@fcrisciani Okey, i will test asap. What do you think is the cause of this difference? It's strange that only without healthcheck the starting is faster.

fcrisciani commented 7 years ago

Thanks @muresan I did not know of that caching that Elastic was doing, it all actually makes sense. @kladiv, the health check has the specific purpose to validate that the application inside the container is actually ready to do work, so till the container is not mark as healthy it won't appear as a possible destination through the DNS or the internal load balancer. The reasoning behind it is to avoid that scaling up a service actually adds immediately tasks not capable to handle work yet. In this case the container with elastic start but takes some time before be marked as healthy so if the other tries to resolve it they won't see it immediately and because of the caching also when it activates properly the other elastic instances are not retrying the DNS resolution showing the same empty list for the tasks. Let me know if you have any question

darklow commented 6 years ago

@fcrisciani thank you for your examples, they work good.

However maybe someone can help me. I cannot figure how to access this cluster outside of the docker network, I tried changing ports to: - "0.0.0.0:9200:9200" and still cannot get access to it. It is available only from inside of the any of the docker containers.

curl http://localhost:9200 curl: (7) Failed to connect to localhost port 9200: Connection refused

Any ideas how to expose it to master server?

fcrisciani commented 6 years ago

@darklow you need to expose the port. If you run docker service create you need to specify -p 9200:9200 else if you use the compose file that I added that should already work. You can have issues only if the port is already used by some other service

darklow commented 6 years ago

@fcrisciani I use docker-compose.yml and command I use to deply is:

docker stack deploy --with-registry-auth -c deploy/swarm/docker-compose.yml cp

docker-compose.yml:

version: "3.3"

services:

  es:
    image: elasticsearch:5.6-alpine
    command: [ elasticsearch, -E, "network.host=_eth0:ipv4_", -E, discovery.zen.ping.unicast.hosts=tasks.es, -E, discovery.zen.minimum_master_nodes=2, -E, cluster.name=my-cluster ]
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
    networks:
      - cp
    deploy:
      replicas: 2

networks:
  cp:
    attachable: true

However when I login into any of instances and try access 9200 port then I got Connection refused:

root@cp1:~# docker ps
CONTAINER ID        IMAGE                      COMMAND                  CREATED             STATUS              PORTS                NAMES
765cd85c977f        elasticsearch:5.6-alpine   "/docker-entrypoin..."   4 hours ago         Up 4 hours          9200/tcp, 9300/tcp   cp_es.2.ngv5kpos75m8hvpsmyux5gzzp

root@cp1:~# curl http://localhost:9200
curl: (7) Failed to connect to localhost port 9200: Connection refused

But if I login into any of docker instances then I see that ES is running:

root@cp1:~# docker exec -it 765 /usr/bin/wget http://tasks.es:9200/ -O-
Connecting to tasks.es:9200 (10.0.0.4:9200)
{
  "name" : "ISDgGMQ",
  "cluster_name" : "my-cluster",
  "cluster_uuid" : "1xY0qSqqSGSBAqzVltSVlw",
  "version" : {
    "number" : "5.6.4",
    "build_hash" : "8bbedf5",
    "build_date" : "2017-10-31T18:55:38.105Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.1"
  },
  "tagline" : "You Know, for Search"
}
fcrisciani commented 6 years ago

Can you try curl -4 to force the use of ipv4?

darklow commented 6 years ago

@fcrisciani Thanks for trying to help, unfortunately curl -4 didn't helped. I figured out and it was completely different issue and apparently it is by design like this. I tried changing ports parameters on my docker-compose.yml and noticed that it doesn't affect the way ports are exposed outside of swarm cluster, even if I put 9205:9205 it would still show 9200/tcp in docker ps and docker inspect.

Until I found this: https://docs.docker.com/engine/swarm/services/#publish-a-services-ports-directly-on-the-swarm-node

So I needed specifically to mention mode: host and now it works!

    ports:
      - published: 9200
        target: 9200
        protocol: tcp
        mode: host

Not sure if this is only to recent swarm/docker versions or always been like that, but it finally works and I can access elasticsearch within swarm cluster and from the any instance as well. Although to be honest I still don't get why I can't access without using mode:host, which automatically exposes port as 0.0.0.0:9200 while I would like to expose it on localhost, but "9200:9200" or "127.0.0.1:9200:9200" doesn't work and I receive Connection refused :/

fcrisciani commented 6 years ago

@darklow it really depends on what you want to do. If you want to expose a port at the level of the cluster so that any swarm mode will expose your service no matter which is the node where the container run that you can use the compose that I was using. Only thing to notice on Linux machines is that if you have 2 stack ipv4 and ipv6, ipv6 is preferred so the curl command can fail because try to use ipv6. If you instead want to expose a port only on the specific node then yes host mode is the way to go. You can also change the public port exposed if the 2 numbers are not matching like -p 10000:5000, but this means you are exposing port 10000 on the host and this will be hooked up to the port 5000 inside the container. So for elastic you can do -p 10000:9200 and you should be able to reach elastic using the IP of the host and port 10000.

darklow commented 6 years ago

What I was trying to achieve apparently is not possible at the moment (map 127.0.01:9200:9200, so that I can use nginx outside of swarm cluster to proxy the 9200 port, but at the same time without exposing 9200 to public), here is the specific issue on docker: https://github.com/moby/moby/issues/32299 So now that I know this I decided to have nginx inside - as part of the swarm cluster and will proxy 9200 port to public using exposed port of nginx with some basic auth (which actually makes more sense).