elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.95k stars 24.74k forks source link

/_nodes/ endpoint omits data key from attributes hash for data nodes #17354

Closed CVTJNII closed 8 years ago

CVTJNII commented 8 years ago

While querying the /_nodes/ endpoint to categorize nodes for a test I've noticed my data node does not include the 'data' key in the attributes hash:

[29] pry(main)> nodes_data['nodes']['r0YScyXZTX-hUhwWnseHsA']['attributes']
=> {"master"=>"false"}

The master and client, however, do have this key set. The following shows the attributes hases for all 3 nodes in my test cluster, with data being the first:

[30] pry(main)> nodes_data['nodes'].values.map { |n| n['attributes'] }
=> [{"master"=>"false"}, {"data"=>"false", "master"=>"false"}, {"data"=>"false", "master"=>"true"}]

Here is the complete output for the data node:

[34] pry(main)> nodes_data['nodes']['r0YScyXZTX-hUhwWnseHsA']
=> {"name"=>"bbb3a55cacc1",                                  
 "transport_address"=>"192.168.253.160:9301",
 "host"=>"192.168.253.160",
 "ip"=>"192.168.253.160",
 "version"=>"2.2.1",
 "build"=>"d045fc2",
 "http_address"=>"es-cluster-ubuntu-1404/192.168.253.160:9201",
 "attributes"=>{"master"=>"false"},
 "settings"=>
  {"cluster"=>{"name"=>"test"},
   "node"=>{"data"=>"true", "name"=>"bbb3a55cacc1", "master"=>"false"},
   "path"=>{"logs"=>"/usr/share/elasticsearch/logs", "plugins"=>"/usr/share/elasticsearch/plugins", "home"=>"/usr/share/elasticsearch"},
   "discovery"=>{"zen"=>{"ping"=>{"multicast"=>{"enabled"=>"false"}, "unicast"=>{"hosts"=>["172.17.0.1:9300"]}}}},
   "name"=>"bbb3a55cacc1",
   "client"=>{"type"=>"node"},
   "http"=>{"port"=>"9201", "cors"=>{"allow-origin"=>"\"/.*/\"", "enabled"=>"true"}},
   "transport"=>{"tcp"=>{"port"=>"9301"}},
   "config"=>{"ignore_system_properties"=>"true"},
   "network"=>{"host"=>"_site_", "publish_host"=>"es-cluster-ubuntu-1404"}},
 "os"=>{"refresh_interval_in_millis"=>1000, "name"=>"Linux", "arch"=>"amd64", "version"=>"3.13.0-24-generic", "available_processors"=>1, "allocated_processors"=>1},
 "process"=>{"refresh_interval_in_millis"=>1000, "id"=>1, "mlockall"=>false},
 "jvm"=>
  {"pid"=>1,
   "version"=>"1.8.0_72-internal",
   "vm_name"=>"OpenJDK 64-Bit Server VM",
   "vm_version"=>"25.72-b15",
   "vm_vendor"=>"Oracle Corporation",
   "start_time_in_millis"=>1458929284547,
   "mem"=>{"heap_init_in_bytes"=>268435456, "heap_max_in_bytes"=>1065025536, "non_heap_init_in_bytes"=>2555904, "non_heap_max_in_bytes"=>0, "direct_max_in_bytes"=>1065025536},
   "gc_collectors"=>["ParNew", "ConcurrentMarkSweep"],
   "memory_pools"=>["Code Cache", "Metaspace", "Compressed Class Space", "Par Eden Space", "Par Survivor Space", "CMS Old Gen"],
   "using_compressed_ordinary_object_pointers"=>"true"},
 "thread_pool"=>
  {"force_merge"=>{"type"=>"fixed", "min"=>1, "max"=>1, "queue_size"=>-1},
   "percolate"=>{"type"=>"fixed", "min"=>1, "max"=>1, "queue_size"=>1000},
   "fetch_shard_started"=>{"type"=>"scaling", "min"=>1, "max"=>2, "keep_alive"=>"5m", "queue_size"=>-1},
   "listener"=>{"type"=>"fixed", "min"=>1, "max"=>1, "queue_size"=>-1},
   "index"=>{"type"=>"fixed", "min"=>1, "max"=>1, "queue_size"=>200},
   "refresh"=>{"type"=>"scaling", "min"=>1, "max"=>1, "keep_alive"=>"5m", "queue_size"=>-1},
   "suggest"=>{"type"=>"fixed", "min"=>1, "max"=>1, "queue_size"=>1000},
   "generic"=>{"type"=>"cached", "keep_alive"=>"30s", "queue_size"=>-1},
   "warmer"=>{"type"=>"scaling", "min"=>1, "max"=>1, "keep_alive"=>"5m", "queue_size"=>-1},
   "search"=>{"type"=>"fixed", "min"=>2, "max"=>2, "queue_size"=>1000},
   "flush"=>{"type"=>"scaling", "min"=>1, "max"=>1, "keep_alive"=>"5m", "queue_size"=>-1},
   "fetch_shard_store"=>{"type"=>"scaling", "min"=>1, "max"=>2, "keep_alive"=>"5m", "queue_size"=>-1},
   "management"=>{"type"=>"scaling", "min"=>1, "max"=>5, "keep_alive"=>"5m", "queue_size"=>-1},
   "get"=>{"type"=>"fixed", "min"=>1, "max"=>1, "queue_size"=>1000},
   "bulk"=>{"type"=>"fixed", "min"=>1, "max"=>1, "queue_size"=>50},
   "snapshot"=>{"type"=>"scaling", "min"=>1, "max"=>1, "keep_alive"=>"5m", "queue_size"=>-1}},
 "transport"=>{"bound_address"=>["172.17.0.6:9301"], "publish_address"=>"192.168.253.160:9301", "profiles"=>{}},
 "http"=>{"bound_address"=>["172.17.0.6:9201"], "publish_address"=>"192.168.253.160:9201", "max_content_length_in_bytes"=>104857600},
 "plugins"=>[],
 "modules"=>
  [{"name"=>"lang-expression", "version"=>"2.2.1", "description"=>"Lucene expressions integration for Elasticsearch", "jvm"=>true, "classname"=>"org.elasticsearch.script.expression.ExpressionPlugin", "isolated"=>true, "site"=>false},
   {"name"=>"lang-groovy", "version"=>"2.2.1", "description"=>"Groovy scripting integration for Elasticsearch", "jvm"=>true, "classname"=>"org.elasticsearch.script.groovy.GroovyPlugin", "isolated"=>true, "site"=>false}]}

Elasticsearch version: 2.2.1 build d045fc2 JVM version: 1.8.0_72 OS version: elasticsearch:2 Docker container

Steps to reproduce:

  1. curl -XGET 'http://127.0.0.1:9200/_nodes?pretty=true'
javanna commented 8 years ago

Hi @CVTJNII you mean that you would like to see data: true for the data node? We don't do that because it's the default, like we don't do it when master is true as well. This will improve with #16963 where we are separating node roles from attributes, so that nodes info always outputs the roles, but they are not part of attributes anymore. What do you think of that?

CVTJNII commented 8 years ago

Yes, I believe #16963 will help satisfy the spirit of the request as I'm using this output to determine what the cluster type is. However, I'd still prefer the values to be explicitly set, personally I always prefer explicit over implicit as it avoids confusion. I also find 'true' to be an unexpected default, for booleans I assume the default will be 'false', though I suppose if that's documented and I missed it then it's okay.

I'd would also like to point out that the master is returning 'master: true' and not omitting it as a default. As I mentioned above I prefer this behavior and think it would be beneficial if the data node could behave the same as well. Thanks.

javanna commented 8 years ago

If you rely on attributes to determine the node types, you need to take into account default values (in 2.x). All node types default to true (including node.ingest). With #16963 you can rely on the returned roles array instead which will always contain the roles that each node fulfils in the cluster (no matter if you set them explicitly or not). Roles won't be part of attributes anymore though, so you will need to switch to reading from roles from 5.0 on. I think we can consider this fixed.