Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.33k stars 1.06k forks source link

[ContentPackLoaderPeriodical] Couldn't list content packs #1683

Closed tr31z closed 8 years ago

tr31z commented 8 years ago

I have a setup with one master, one slave node and a 3 nodes ES cluster. After updating from graylog 1.2.2 to 1.3.2., the master node won't start while the slave is working fine. I tried to clone the slave machine and turned it into a master to replace the failing one but the problem was still the same. Here are the logs on the master:

2016-01-11T10:17:44.581+01:00 INFO  [node] [graylog-server1] starting ...
2016-01-11T10:17:44.601+01:00 INFO  [Periodicals] Starting [org.graylog2.periodical.AlertScannerThread] periodical in [10s], polling every [60s].
....
2016-01-11T10:17:44.629+01:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRangesCleanupPeriodical] periodical in [15s], polling every [3600s].
2016-01-11T10:17:44.674+01:00 INFO  [IndexerClusterCheckerThread] Indexer not fully initialized yet. Skipping periodic cluster check.
**2016-01-11T10:17:44.682+01:00 ERROR [ContentPackLoaderPeriodical] Couldn't list content packs**
java.nio.file.NoSuchFileException: data/contentpacks
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
    at java.nio.file.Files.newDirectoryStream(Files.java:525)
    at org.graylog2.periodical.ContentPackLoaderPeriodical.getFiles(ContentPackLoaderPeriodical.java:214)
    at org.graylog2.periodical.ContentPackLoaderPeriodical.doRun(ContentPackLoaderPeriodical.java:121)
    at org.graylog2.plugin.periodical.Periodical.run(Periodical.java:83)
    at java.lang.Thread.run(Thread.java:745)
2016-01-11T10:17:44.783+01:00 INFO  [PeriodicalsService] Not starting [org.graylog2.periodical.UserPermissionMigrationPeriodical] periodical. Not configured to run on this node.
2016-01-11T10:17:44.783+01:00 INFO  [Periodicals] Starting [org.graylog2.periodical.AlarmCallbacksMigrationPeriodical] periodical, running forever.
....
2016-01-11T10:17:44.849+01:00 INFO  [transport] [graylog-server1] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/*.*.*.*:9300]}
2016-01-11T10:17:44.869+01:00 INFO  [discovery] [graylog-server1] graylog2/28xX4q5BTCSiZErvdI3bsw
2016-01-11T10:17:45.009+01:00 INFO  [RestApiService] Enabling CORS for REST API
2016-01-11T10:17:47.870+01:00 WARN  [discovery] [graylog-server1] waited for 3s and no initial state was set by the discovery
2016-01-11T10:17:47.870+01:00 INFO  [node] [graylog-server1] started
2016-01-11T10:17:47.967+01:00 INFO  [service] [graylog-server1] detected_master [elasticsearch1][dtgrlxmvTlaVBQqyaBi1yA][localhost][inet[/*.*.*.*:9300]]{master=true}, added {[elasticsearch2][JkqnqxdPRku1dbG94DGgLA][elasticsearch2][inet[/*.*.*.*:9300]]{master=true},[graylog-server2][TJlWIP3tQ1OmrcdkDPvQpQ][graylog-server2][inet[/*.*.*.*:9300]]{client=true, data=false, master=false},[elasticsearch3][r7gR1D8bTTqEudndBh7STg][elasticsearch3][inet[/*.*.*.*:9300]]{master=false},[elasticsearch1][dtgrlxmvTlaVBQqyaBi1yA][localhost][inet[/*.*.*.*:9300]]{master=true},}, reason: zen-disco-receive(from master [[elasticsearch1][dtgrlxmvTlaVBQqyaBi1yA][localhost][inet[/*.*.*.*:9300]]{master=true}])
2016-01-11T10:17:49.695+01:00 INFO  [RestApiService] Adding security context factory: <org.graylog2.security.ShiroSecurityContextFactory@487696a1>
2016-01-11T10:17:49.720+01:00 ERROR [ServiceManager] Service RestApiService [FAILED] has failed in the STARTING state.
org.jboss.netty.channel.ChannelException: Failed to bind to: graylog-server2/*.*.*.*:12900
    at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
    at org.graylog2.shared.initializers.RestApiService.startUp(RestApiService.java:252)
    at com.google.common.util.concurrent.AbstractIdleService$2$1.run(AbstractIdleService.java:54)
    at com.google.common.util.concurrent.Callables$3.run(Callables.java:95)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.BindException: Ne peut attribuer l'adresse demandée
    at sun.nio.ch.Net.bind0(Native Method)
    at sun.nio.ch.Net.bind(Net.java:433)
    at sun.nio.ch.Net.bind(Net.java:425)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
    at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315)
    at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    ... 1 more
2016-01-11T10:17:49.726+01:00 ERROR [InputSetupService] Not starting any inputs because lifecycle is: Uninitialized [LB:DEAD]
2016-01-11T10:17:49.734+01:00 INFO  [LogManager] Shutting down.

On the other nodes of the cluster I see this in loop:

[2016-01-11 10:23:43,414][INFO ][cluster.service          ] [elasticsearch1] added {[graylog-server1][8IGyUBELQyOlq9AdbP1VcA][graylog-server1][inet[/*.*.*.*:9300]]{client=true, data=false, master=false},}, reason: zen-disco-receive(join from node[[graylog-server1][8IGyUBELQyOlq9AdbP1VcA][graylog-server1][inet[/*.*.*.*:9300]]{client=true, data=false, master=false}])
[2016-01-11 10:23:45,340][INFO ][cluster.service          ] [elasticsearch1] removed {[graylog-server1][8IGyUBELQyOlq9AdbP1VcA][graylog-server1][inet[/*.*.*.*:9300]]{client=true, data=false, master=false},}, reason: zen-disco-node_left([graylog-server1][8IGyUBELQyOlq9AdbP1VcA][graylog-server1][inet[/*.*.*.*:9300]]{client=true, data=false, master=false})
joschi commented 8 years ago
2016-01-11T10:17:49.720+01:00 ERROR [ServiceManager] Service RestApiService [FAILED] has failed in the STARTING state.
org.jboss.netty.channel.ChannelException: Failed to bind to: graylog-server2/*.*.*.*:12900

This looks like either another (stale) instance of the Graylog server node is running on that machine or that the process isn't allowed to bind to this network interface (e. g. because of an SELinux, AppArmor, or grsecurity/RBAC policy).

The message from ContentPackLoaderPeriodical is rather informational and can be fixed by either disabling the content pack loader (https://github.com/Graylog2/graylog2-server/blob/1.3.2/misc/graylog2.conf#L411-L412) or by creating that directory.

tr31z commented 8 years ago

Selinux is disabled

sestatus 
SELinux status:                 disabled

And only one instance is running

# ps aux |grep graylog
avahi      578  0.0  0.0  27944  1452 ?        Ss   11:47   0:00 avahi-daemon: running [graylog-server1.local]
graylog   3334  0.0  0.0 115212  1452 ?        Ss   11:50   0:00 /bin/sh /usr/share/graylog-server/bin/graylog-server
graylog   3335  144 19.4 3758704 756092 ?      Sl   11:50   0:20 /usr/bin/java -Xms1g -Xmx1g -XX:NewRatio=1 -XX:PermSize=128m -XX:MaxPermSize=256m -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -jar -Dlog4j.configuration=file:///etc/graylog/server/log4j.xml -Djava.library.path=/usr/share/graylog-server/lib/sigar -Dgraylog2.installation_source=rpm /usr/share/graylog-server/graylog.jar server -f /etc/graylog/server/server.conf -np
root      3487  0.0  0.0 112664   944 pts/0    S+   11:50   0:00 grep --color=auto graylog

if I stop graylog-server and do a 'nc -l 12900', I am able to connect with 'telnet graylog-server1 12900' from another node

joschi commented 8 years ago

If you can connect to *:12900 if you've stopped the Graylog server, then there's definitely some other process running on that port. Stop the Graylog server and check with lsof -i :12900.

Which type of installation of Graylog are you running and on which operating system?

tr31z commented 8 years ago

I can connect because I ran 'nc -l 12900' :). If I stop graylog-server and the netcat listener, I cannot connect to the 12900 port from the outside and lsof -i :12900 doesn't return anything. I think that running 'nc -l 12900' (logged in with the graylog user) shows that the problem doesn't come from a misconfiguration of the system.

I installed graylog 1.3.2 from the repositories and I am running on centos 7

joschi commented 8 years ago

So does the error message persist if you start the Graylog server when lsof -i :12900 doesn't print anything?

tr31z commented 8 years ago

yes :/

[root@graylog-server1 ~]# lsof -i :12900
[root@graylog-server1 ~]# /bin/sh /usr/share/graylog-server/bin/graylog-server
OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
Exception in thread "RestApiService STARTING" org.jboss.netty.channel.ChannelException: Failed to bind to: graylog-server2/*.*.*.*:12900
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
joschi commented 8 years ago

Is *.*.*.*:12900 the literal setting you've written into the configuration file? Please post the relevant parts of your Graylog server configuration file.

tr31z commented 8 years ago

Well I finally got it working by reinstalling the 1.2.2 and then the 1.3.2 back again. I kept the same config files all along so I couldn't figure what was the source of the problem... Anyway, thanks for help and your patience :)