Graylog2 / graylog2-images

Ready to run machine images
Apache License 2.0
238 stars 91 forks source link

sudo graylog2-ctl reconfigure - error #28

Closed adfrancis closed 9 years ago

adfrancis commented 9 years ago

Ive deployed the latest official ova file and changed the ip address to a static address and everything had been fine.

Ive since added dns-nameservers string to the graylog2 vm via /etc/network/interfaces as well as changed the hostname and . After a reboot of the appliance and re-running sudo graylog2-ctl reconfigure it errors now when it reaches

“ruby_block[add node to cluster list] action run”. It hangs from here and eventually hangs. I’m wondering if its due to the DNS servers being added or the hostname change. Ill add this to the bug tracker.

-------------------output of error-----------------------------------------------------------------------------------------

* execute[/opt/graylog2/embedded/bin/graylog2-ctl start elasticsearch] action run
    - execute /opt/graylog2/embedded/bin/graylog2-ctl start elasticsearch
  * ruby_block[add node to cluster list] action run

    ================================================================================
    Error executing action `run` on resource 'ruby_block[add node to cluster list]'
    ================================================================================

    NameError
    ---------
    undefined local variable or method `name' for #<Graylog2Registry:0x0000000411cc50>

    Cookbook Trace:
    ---------------
    /opt/graylog2/embedded/cookbooks/graylog2/libraries/registry.rb:94:in `rescue in add_node'
    /opt/graylog2/embedded/cookbooks/graylog2/libraries/registry.rb:91:in `add_node'
    /opt/graylog2/embedded/cookbooks/graylog2/libraries/registry.rb:44:in `add_es_node'
    /opt/graylog2/embedded/cookbooks/graylog2/recipes/elasticsearch.rb:48:in `block (2 levels) in from_file'

    Resource Declaration:
    ---------------------
    # In /opt/graylog2/embedded/cookbooks/graylog2/recipes/elasticsearch.rb

     46: ruby_block "add node to cluster list" do
     47:   block do
     48:     $registry.add_es_node(node['ipaddress'])
     49:   end
     50: end

    Compiled Resource:
    ------------------
    # Declared in /opt/graylog2/embedded/cookbooks/graylog2/recipes/elasticsearch.rb:46:in `from_file'

    ruby_block("add node to cluster list") do
      action "run"
      retries 0
      retry_delay 2
      default_guard_interpreter :default
      block_name "add node to cluster list"
      declared_type :ruby_block
      cookbook_name :graylog2
      recipe_name "elasticsearch"
      block #<Proc:0x0000000404e008@/opt/graylog2/embedded/cookbooks/graylog2/recipes/elasticsearch.rb:47>
    end

Recipe: timezone-ii::debian
  * bash[dpkg-reconfigure tzdata] action run
    - execute "bash"  "/tmp/chef-script20150211-1782-ty5ypx"

Running handlers:
[2015-02-11T11:54:03-05:00] ERROR: Running exception handlers
Running handlers complete
[2015-02-11T11:54:03-05:00] ERROR: Exception handlers complete
[2015-02-11T11:54:03-05:00] FATAL: Stacktrace dumped to /opt/graylog2/embedded/cookbooks/cache/chef-stacktrace.out
Chef Client failed. 5 resources updated in 308.503219999 seconds
[2015-02-11T11:54:03-05:00] ERROR: ruby_block[add node to cluster list] (graylog2::elasticsearch line 46) had an error: NameError: undefined local variable or method `name' for #<Graylog2Registry:0x0000000411cc50>
[2015-02-11T11:54:03-05:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
mariussturm commented 9 years ago

Thanks for the report! Unfortunately I can't reproduce this issue. Could you please provide the following informations:

adfrancis commented 9 years ago

Sure, see below

--------------------------------------output--------------------------------------------------------------------------

administrator@ash-vsrv-net01:~$ cat /etc/hostname ash-vsrv-net01 administrator@ash-vsrv-net01:~$ cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 ash-vsrv-net01

The following lines are desirable for IPv6 capable hosts

::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters administrator@ash-vsrv-net01:~$ /opt/graylog2/embedded/bin/etcdctl ls /elasticsearch /elasticsearch/10.201.51.126 administrator@ash-vsrv-net01:~$

On Wed Feb 11 2015 at 2:28:30 PM Marius Sturm notifications@github.com wrote:

Thanks for the report! Unfortunately I can't reproduce this issue. Could you please provide the following informations:

  • cat /etc/hostname
  • cat /etc/hosts
  • /opt/graylog2/embedded/bin/etcdctl ls /elasticsearch

— Reply to this email directly or view it on GitHub https://github.com/Graylog2/graylog2-images/issues/28#issuecomment-73947020 .

mariussturm commented 9 years ago

This looks all good. This could also help:

The error is an indicator that etcd is not running or can not be reached. But the last command shows that the service actually works. Does the error still persist or was it just a single apperance?

adfrancis commented 9 years ago

See output below. I can reproduce the same original error with every execution of 'sudo graylog2-ctl reconfigure' on this box.

------------------------------output----------------------------------- root@ash-vsrv-net01:~# ifconfig eth0 Link encap:Ethernet HWaddr 00:50:56:b3:77:8f inet addr:10.201.51.126 Bcast:10.201.51.255 Mask:255.255.255.0 inet6 addr: fe80::250:56ff:feb3:778f/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:77579 errors:0 dropped:10 overruns:0 frame:0 TX packets:17657 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:8440389 (8.4 MB) TX bytes:18782922 (18.7 MB)

lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:840769 errors:0 dropped:0 overruns:0 frame:0 TX packets:840769 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:132974723 (132.9 MB) TX bytes:132974723 (132.9 MB)

root@ash-vsrv-net01:~# pgrep -a etcd 892 /opt/graylog2/embedded/sbin/etcd -listen-client-urls=http://0.0.0.0:2379 ,http://0.0.0.0:4001 -data-dir=/var/opt/graylog2/data/etcd root@ash-vsrv-net01:~# cat /etc/graylog2/graylog2-settings.json { "timezone": "EST", "smtp_server": "smtp-relay.wilresearch.com", "smtp_port": 587, "smtp_user": "", "smtp_password": "", "master_node": "127.0.0.1", "local_connect": false, "current_address": "10.201.51.126", "last_address": "10.201.51.126" } root@ash-vsrv-net01:~#

On Wed Feb 11 2015 at 2:50:11 PM Marius Sturm notifications@github.com wrote:

This looks all good. This could also help:

  • ifconfig
  • pgrep -a etcd
  • cat /etc/graylog2/graylog2-settings.json

The error is an indicator that etcd is not running or can not be reached. But the last command shows that the service actually works. Does the error still persist or was it just a single apperance?

— Reply to this email directly or view it on GitHub https://github.com/Graylog2/graylog2-images/issues/28#issuecomment-73951210 .

mariussturm commented 9 years ago

All your settings are perfectly fine and when I try to reproduce with the same I dont get that error :/

I have improved the output of the reconfigure run a little bit, could you please replace the file /opt/graylog2/embedded/cookbooks/graylog2/libraries/registry.rb with this one https://raw.githubusercontent.com/Graylog2/omnibus-graylog2/0.92/files/graylog2-cookbooks/graylog2/libraries/registry.rb

Afterwards re-run sudo graylog2-ctl reconfigure and then check the whole output for stack trace or errors. Maybe there are some informations.

Did you change anything in the /var/opt/graylog2/data directory? Like user rights or mount a bigger drive there or so?

adfrancis commented 9 years ago

I made a backup of the registry.rb file and added your updated code to a new registry.rb file and re-ran 'sudo graylog2-ctl reconfigure'. I still received the same errors. Here is the fulloutput of the commmand:

P.S functionaly the GrayLog2 appliance seems to be working ok. Ive been using it since it was initially deployed yesterday and I have no errors in the web interface, only when executing 'sudo graylog2-ctl reconfigure'. ---------------------output--------------------

root@ash-vsrv-net01:/# sudo graylog2-ctl reconfigure Starting Chef Client, version 12.0.3 Compiling Cookbooks... Recipe: graylog2::default

Error executing action `run` on resource 'ruby_block[add node to server

list]'

Net::HTTPFatalError
-------------------
500 "Internal Server Error"

Cookbook Trace:
---------------
/opt/graylog2/embedded/cookbooks/graylog2/libraries/registry.rb:15:in

`set_master'

/opt/graylog2/embedded/cookbooks/graylog2/recipes/graylog2-server.rb:71:in `block (2 levels) in from_file'

Resource Declaration:
---------------------
# In

/opt/graylog2/embedded/cookbooks/graylog2/recipes/graylog2-server.rb

 69: ruby_block "add node to server list" do
 70:   block do
 71:     $registry.set_master
 72:     $registry.add_gl2_server(node['ipaddress'])
 73:     $registry.add_es_node(node['ipaddress'])
 74:   end
 75: end

Compiled Resource:
------------------
# Declared in

/opt/graylog2/embedded/cookbooks/graylog2/recipes/graylog2-server.rb:69:in `from_file'

ruby_block("add node to server list") do
  action "run"
  retries 0
  retry_delay 2
  default_guard_interpreter :default
  block_name "add node to server list"
  declared_type :ruby_block
  cookbook_name :graylog2
  recipe_name "graylog2-server"
  block #<Proc:0x00000004474650@

/opt/graylog2/embedded/cookbooks/graylog2/recipes/graylog2-server.rb:70> end

Recipe: timezone-ii::debian

Running handlers: [2015-02-12T10:42:58-05:00] ERROR: Running exception handlers Running handlers complete [2015-02-12T10:42:58-05:00] ERROR: Exception handlers complete [2015-02-12T10:42:58-05:00] FATAL: Stacktrace dumped to /opt/graylog2/embedded/cookbooks/cache/chef-stacktrace.out Chef Client failed. 10 resources updated in 608.559446104 seconds [2015-02-12T10:42:58-05:00] ERROR: ruby_block[add node to server list](graylog2::graylog2-server line 69) had an error: Net::HTTPFatalError: 500 "Internal Server Error" [2015-02-12T10:42:58-05:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1) root@ash-vsrv-net01:/#

On Thu Feb 12 2015 at 2:10:34 AM Marius Sturm notifications@github.com wrote:

All your settings are perfectly fine and when I try to reproduce with the same I dont get that error :/

I have improved the out out of the reconfigure run a little bit, could you please replace the file /opt/graylog2/embedded/cookbooks/graylog2/libraries/registry.rb with this one https://raw.githubusercontent.com/Graylog2/omnibus-graylog2/0.92/files/graylog2-cookbooks/graylog2/libraries/registry.rb

Afterwards re-run sudo graylog2-ctl reconfigure and then check the whole output for stack trace or errors. Maybe there are some informations.

Did you change anything in the /var/opt/graylog2/data directory? Like user rights or mount a bigger drive there or so?

— Reply to this email directly or view it on GitHub https://github.com/Graylog2/graylog2-images/issues/28#issuecomment-74027246 .

mariussturm commented 9 years ago

I guess I can't really solve this problem. From the error message it is now pretty clear that the etcd service is not running properly. That is a small daemon that stores IP addresses of the Graylog cluster. You can talk to the daemon via HTTP and exactly this doesn't work on your box.

You can check if there are any errors in the logs of etcd with sudo graylog2-ctl tail etcd. If yes try to restart the service with sudo graylog2-ctl restart etcd.

In doubt you have to delete and re-create the VM but as you said Graylog is running and only the reconfigure throws an error.

adfrancis commented 9 years ago

Not a problem and thanks for ll the feedback you have provided. Its very much appreciated.

On Thu Feb 12 2015 at 11:09:58 AM Marius Sturm notifications@github.com wrote:

I guess I can't really solve this problem. From the error message it is now pretty clear that the etcd service is not running properly. That is a small daemon that stores IP addresses of the Graylog cluster. You can talk to the daemon via HTTP and exactly this doesn't work on your box.

You can check if there are any errors in the logs of etcd with sudo graylog2-ctl tail etcd. If yes try to restart the service with sudo graylog2-ctl restart etcd.

In doubt you have to delete and re-create the VM but as you said Graylog is running and only the reconfigure throws an error.

— Reply to this email directly or view it on GitHub https://github.com/Graylog2/graylog2-images/issues/28#issuecomment-74097838 .