Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.41k stars 1.07k forks source link

Fresh install of 1.0.0 - streams always getting auto paused #1122

Closed neilferreira closed 9 years ago

neilferreira commented 9 years ago

Hello

I have had numerous problems with my installation of 1.0.0 stable having its streams auto paused all the time. I fired up a brand new m3.medium EC2 instance hosting only Graylog server (blank install) and web interface, my streams are still always being paused:

When I login to the control panel, I see this:

Nodes with too long GC pauses 2 days ago
There are Graylog nodes on which the garbage collector runs too long. Garbage collection runs should be as short as possible. Please check whether those nodes are healthy. (Node: e760fded-2611-4eb1-9864-c354c65ce368, GC duration: 1627 ms, GC threshold: 1000 ms)

Also getting this:

Processing of a stream has been disabled due to excessive processing time. 2 days ago
The processing of stream Reasons has taken too long for 3 times. To protect the stability of message processing, this stream has been disabled. Please correct the stream rules and reenable the stream. Check this article for more details.

I have a few streams that all check the level (for errors) and then have a regex to check a few hostnames, ie:

level must be smaller than 5  
source must match regular expression myhost1.com.au|myhost2.com.au|myhost3.com.au

What is lacking in the default setup of Graylog that is causing it to behave and perform so badly? Can anyone provide any tips?

Thankyou.

bernd commented 9 years ago

It is hard to say what the limiting factor on your machines is. That totally depends on the message load (number msgs/s, message size), configured extractors, configured streams, Java heap settings, CPU cores, graylog2.conf settings and ES performance.

You can tweak the settings for disabling the streams with the following two settings.

# Every message is matched against the configured streams and it can happen that a stream contains rules which
# take an unusual amount of time to run, for example if its using regular expressions that perform excessive backtracking.
# This will impact the processing of the entire server. To keep such misbehaving stream rules from impacting other
# streams, Graylog limits the execution time for each stream.                   
# The default values are noted below, the timeout is in milliseconds.           
# If the stream matching for one stream took longer than the timeout value, and this happened more than "max_faults" times
# that stream is disabled and a notification is shown in the web interface.     
stream_processing_timeout = 2000                                               
stream_processing_max_faults = 3     

You can either increase the timeout or the max faults settings. (setting the max faults to 0 will disable the stream pausing completely)

I hope that helps. Please re-open the issue if you still have problems after playing with the settings. Thank you!