Closed dragosrosculete closed 8 years ago
Nothing further on this ticket. Closing
I have the same problem, and I got out of ideas. The riding socket is out of question, since we use latest kernel (3.16.7-ckt20-1+deb8u1). ES version 1.7.4. Debian Jessie. Java build 1.8.0_66-internal-b17.
Here is the debug log:
[2016-01-18 21:38:27,599][TRACE][transport.tracer ] [graylog-es-1-vm] [121786447][cluster:monitor/stats[n]] sent to [[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[10.107.61.96/10.107.61.96:9300]]{client=true, data=false, master=false}] (timeout: [10m])
[2016-01-18 21:38:27,619][TRACE][transport.tracer ] [graylog-es-1-vm] [121786447][cluster:monitor/stats[n]] received response from [[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[10.107.61.96/10.107.61.96:9300]]{client=true, data=false, master=false}]
[2016-01-18 21:38:39,979][TRACE][transport.tracer ] [graylog-es-1-vm] [121788793][cluster:monitor/stats[n]] sent to [[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[10.107.61.96/10.107.61.96:9300]]{client=true, data=false, master=false}] (timeout: [10m])
[2016-01-18 21:38:39,999][TRACE][transport.tracer ] [graylog-es-1-vm] [121788793][cluster:monitor/stats[n]] received response from [[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[10.107.61.96/10.107.61.96:9300]]{client=true, data=false, master=false}]
[2016-01-18 21:38:50,991][DEBUG][transport.netty ] [graylog-es-1-vm] disconnecting from [[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[10.107.61.96/10.107.61.96:9300]]{client=true, data=false, master=false}], channel closed event
[2016-01-18 21:38:50,991][TRACE][transport.netty ] [graylog-es-1-vm] disconnected from [[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[10.107.61.96/10.107.61.96:9300]]{client=true, data=false, master=false}], channel closed event
[2016-01-18 21:38:50,998][INFO ][cluster.service ] [graylog-es-1-vm] removed {[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[10.107.61.96/10.107.61.96:9300]]{client=true, data=false, master=false},}, reason: zen-disco-node_failed([graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[10.107.61.96/10.107.61.96:9300]]{client=true, data=false, master=false}), reason transport disconnected
[2016-01-18 21:38:55,044][DEBUG][transport.netty ] [graylog-es-1-vm] connected to node [[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[/10.107.61.96:9300]]{client=true, data=false, master=false}]
[2016-01-18 21:38:55,044][TRACE][transport.tracer ] [graylog-es-1-vm] [121790018][internal:discovery/zen/join/validate] sent to [[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[/10.107.61.96:9300]]{client=true, data=false, master=false}] (timeout: [null])
[2016-01-18 21:38:55,061][TRACE][transport.tracer ] [graylog-es-1-vm] [121790018][internal:discovery/zen/join/validate] received response from [[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[/10.107.61.96:9300]]{client=true, data=false, master=false}]
[2016-01-18 21:38:55,061][INFO ][cluster.service ] [graylog-es-1-vm] added {[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[/10.107.61.96:9300]]{client=true, data=false, master=false},}, reason: zen-disco-receive(join from node[[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[/10.107.61.96:9300]]{client=true, data=false, master=false}])
[2016-01-18 21:38:55,093][TRACE][transport.tracer ] [graylog-es-1-vm] [121790022][internal:discovery/zen/publish] sent to [[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[/10.107.61.96:9300]]{client=true, data=false, master=false}] (timeout: [null])
[2016-01-18 21:38:55,261][TRACE][transport.tracer ] [graylog-es-1-vm] [121790022][internal:discovery/zen/publish] received response from [[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[/10.107.61.96:9300]]{client=true, data=false, master=false}]
[2016-01-18 21:39:04,854][TRACE][transport.tracer ] [graylog-es-1-vm] [121791213][cluster:monitor/stats[n]] sent to [[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[10.107.61.96/10.107.61.96:9300]]{client=true, data=false, master=false}] (timeout: [10m])
[2016-01-18 21:39:04,874][TRACE][transport.tracer ] [graylog-es-1-vm] [121791213][cluster:monitor/stats[n]] received response from [[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[10.107.61.96/10.107.61.96:9300]]{client=true, data=false, master=false}]
[2016-01-18 21:39:17,381][TRACE][transport.tracer ] [graylog-es-1-vm] [121793559][cluster:monitor/stats[n]] sent to [[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[10.107.61.96/10.107.61.96:9300]]{client=true, data=false, master=false}] (timeout: [10m])
[2016-01-18 21:39:17,401][TRACE][transport.tracer ] [graylog-es-1-vm] [121793559][cluster:monitor/stats[n]] received response from [[graylog2-server][tzklxLueQ8OApiHUMdK0og][glog-o-master2][inet[10.107.61.96/10.107.61.96:9300]]{client=true, data=false, master=false}]
It is true that we have a rather big cluster, but only the master disconnects, not the nodes. They communicate through a VPN tunnel, maybe somebody has another idea how to improve this.
Is it normal that the nodes are queried so often with the stats, i.e. each few seconds?
Thank you!
Hey,
I am having trouble for some while. I am getting random node disconnects and I cannot explain why. There is no increase in traffic ( search or index ) when this is happening , it feels so random to me . I first thought it could be the aws cloud plugin so I removed it and used unicast and pointed directly to my nodes IPs but that didn't seem to be the problem . I changed the type of instances, now m3.2xlarge, added more instances, made so much modifications in ES yml config and still nothing . Changed java oracle from 1.7 to 1.8 , changed CMS collector to G1GC and still nothing .
I am out of ideas ... how can I get more info on what is going on ?
Here are the logs I can see from master node and the data node http://pastebin.com/GhKfRkaa