logstash 1.5 stops working after a few events

pvanderlinden commented 9 years ago

The only error I see is this line almost constantly:

config LogStash::Codecs::Plain/@charset = "UTF-8" {:level=>:debug, :file=>"logstash/config/mixin.rb", :line=>"112", :method=>"config_init"}

pvanderlinden commented 9 years ago

My config:

input {
  lumberjack {
    port => 5043
    ssl_certificate => "/etc/logstash/lumberjack.crt"
    ssl_key => "/etc/logstash/lumberjack.key"
  }
}

filter {
  # logstash-forwarder does not support tags array, the tags then have
  # to be shipped as a csv string;
  # before any other thing happens, filter application etc., the tags
  # array must be constructed from the csv string that comes in.
  mutate {
     split => ["tags", ","]
  }
}
filter {
    if [type] == "nginx" {
        grok {
            match => {
                "message" => "%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{URIHOST:urihost}%{SPACE}\"%{URIPATH:path}\"%{SPACE}\"%{WORD:method}%{SPACE}%{URIPATHPARAM:request}%{SPACE}HTTP/%{NUMBER:httpversion}\"%{SPACE}%{NUMBER:response}%{SPACE}%{NUMBER:time}%{SPACE}%{NUMBER:bytes}%{SPACE}%{QS:referrer}%{SPACE}%{IP:remote_address}%{SPACE}%{QS:agent}%{SPACE}%{QS:device_string}"
            }
        }

        # Convert some types
        mutate {
            convert => {
                response => integer
                time => float
                bytes => integer
            }
        }

        # Tag internal requests
        cidr {
            add_tag => "internal_request"
            address => ["%{remote_address}"]
            network => [ "10.0.0.0/8" ]
        }
        # Tag internal requests
        cidr {
            add_tag => "internal_request"
            address => ["%{remote_address}"]
            network => [ "127.0.0.1/32" ]
        }

        # Add location info
        geoip {
            source => "remote_address"
            target => "geoip"
        }

        # Parse user agent
        useragent {
            source => "agent"
            target => "user_agent"
        }

        # Parse device_string
        kv {
            source => "device_string"
            target => "device"
            field_split => ";"
            value_split => ":"
            trim => " \""
            trimkey => " \""
        }

        # Tag app requests
        if [device] {
            mutate { add_tag => "request" }
        }
    }
}
filter {
    if [type] == "postgresql" {
        # This prevents logstash to run in multi filter worker mode: is there any way to move this to logstash forwarder?
        multiline {
            pattern => "^%{TIMESTAMP_ISO8601}.*"
            what => previous
            negate => true
        }

        csv {
            columns => ["log_time", "user_name", "database_name", "process_id", "connection_from", "session_id", "session_line_num", "command_tag", "session_start_time", "virtual_transaction_id", "transaction_id", "error_severity", "sql_state_code", "message", "detail", "hint", "internal_query", "internal_query_pos", "context", "query", "query_pos", "location"]
        }

        mutate {
            gsub => [ "message", "[\n\t]+", " "]
        }

        # use timestamp from log file
        date {
            match => ["log_time", "YYYY-MM-dd HH:mm:ss.SSS z"]
        }

        grok {
            match => ["message", "duration: %{DATA:duration:int} ms"]
            tag_on_failure => []
            add_tag => "sql_message"
        }

        grok  {
            match =>["message", "statement: %{GREEDYDATA:statement}"]
            tag_on_failure => []
            add_tag => "slow_statement"
        }

    }
}
filter {
    if [type] == "cloud_init_user_data" {
        # Really generic multi line filter
        multiline {
            pattern => "^\s"
            what => "previous"
        }

        # Some lines are prefixed with a level
        grok {
            match => {
                "message" => "^\[(?<levelname>[^ \]]+)\s*\]\s+(?<message>.*)$"
            }
            overwrite => ["message"]
            # Ignore failures
            tag_on_failure => []
        }
    }

    if [type] == "cloud_init_user_data_json" {
        json {
            source => "message"
        }

        # Assume it is a salt call if there is the key local
        if [local] {
            ruby {
                code => "
begin
    event['salt'] = []
    event['local'].each do |key, value|
        value['key'] = key
        event['salt'] << value
    end
    event.remove('local')
    event.remove('message')
rescue Exception => e
    event['ruby_exception'] = 'salt_result: ' + e.message
end
"
            }

            split {
                add_tag => ["salt_state"]
                field => "salt"
            }
        }
    }
}
filter {
    if [type] == "syslog" {
        json {
            source => "message"
        }
    }
}
filter {
    if [type] == "python" {
        json {
            source => "message"
        }

        mutate {
            rename => {
                "@message" => "message"
            }
        }
    }
}
output {
    file {
        gzip => true
        path => "/var/log/logstash/%{+YYYY-MM-dd}/%{type}.json.gz"
    }
}
output {
    elasticsearch {
        host => "es.vpc.app.com"
        workers => 2
    }
}

pvanderlinden commented 9 years ago

Also: https://github.com/elastic/logstash/issues/3276 is probably related

colinsurprenant commented 9 years ago

@pvanderlinden the log you reported in the first comment does not look like and error but just a debug log so I am not sure it is relevant.

When you say "after a few events", how many is that? under 10? is it possible for you to systematically reproduce the problem with a sample log file? If so, could you share that with us so we can try to reproduce here? You could also try to reproduce by feeding this sample log file to a stdin input.

Also, for a test, could you try to remove the workers options on the elasticsearch output to see if it helps?

pvanderlinden commented 9 years ago

I completely removed logstash 1.5 from the production because we were missing all our logs. I haven't been able to reproduce it so far on the local development machine, but I have been able to have in constantly on production and staging. Also some people I spoke to in the irc channel like @ph think the logline actually is a plugin constantly re-booting.

ph commented 9 years ago

@jsvd is talking to @pvanderlinden on irc to find more debug info.

ph commented 9 years ago

If I remember correctly from our discussion on IRC, we didn't have any errors in the LSF logs.

pvanderlinden commented 9 years ago

@ph Yes no errors, but nothing in elasticsearch. I tried in different ways, but I haven't manage to reproduce it on my local machine (and don't want to risk breaking production again).

pvanderlinden commented 9 years ago

Is there anything being done with this bug, I'm happy to run some tests. But so far 1.5 seems to be unusable.

jsvd commented 9 years ago

yes @pvanderlinden we're still investigating and trying to replicate, piecing together the different reports which seem related to each other.. If you're able to replicate this in a non production environment the best test you can do is gradually remove stuff from the configuration and see where the problem appears again..

Suggestion iteration steps (one at a time):

removing all filters
removing file output
removing elasticsearch output

pvanderlinden commented 9 years ago

I saw the hang issues on a different stall issue are solved as of 1.5.1, I will wait for that release, and test it on staging, see if it will solve the problem. Upgrading/downgrading logstash is a bit to painful to do often (the only solution with an upgrade seems to be restarting the whole machine)

pvanderlinden commented 9 years ago

Seems to be actually solved

ph commented 9 years ago

@pvanderlinden Thx for updating this!

elastic / logstash

logstash 1.5 stops working after a few events #3275