Possible memory issues with logstash 1.5.1 and higher (1.5.0 is OK)

mzubal commented 9 years ago

Hi, there have been several other bugs reports related to logstash taking too much memory. I encountered similar issues as well. Environment:

windows server 2012
JRE 1.8.0_60
logstash configuration is pasted below
installed logstash-filter-aggregate
ES_MAX_MEM = 384m

Running 1.5.1 and higher (I have tried all of them) causes the java process slowly taking more memory (as can be seen on screenshot below), ending up with GBs of memory consumed. The Java Heap is fine as can be seen from the screenshots - I suspect the native memory usage to be the problem here. I have tried to remove all filters and elasticsearch output (just keeping the inputs and stdout output) with the same result. On 1.5.0 this doesn't happen (as can be seen on screenshot below). So this leads me to conclusion, that something bad happened in 1.5.1 causing this behavior. I would be very glad if you checked that. Thanks!

logstash-1.5.0: 1 5 0 logstash-1.5.3: 1 5 3

logstash config:

input {
  file {
    path => [
      "C:/inetpub/wwwroot/**/logs/**/ALL.log*",
      "C:/inetpub/wwwroot/**/logs/**/FATAL.log*",
      "C:/DAE_JMS/**/logs/**/ALL.log*",
      "C:/DAE_JMS/**/logs/**/FATAL.log*"          
    ]
    type => damas_log
  }
  file {
    path => [

      "C:/inetpub/wwwroot/**/logs/**/RequestPerformance.csv*"

    ]
    type => damas_perf_log
  }
}

filter {
    mutate {
      add_field => [ "environment", "NBS10" ]
      add_field => [ "application", "DAMAS" ]
    }

    if [type] == "damas_log" {

        if ([message] =="[Header]") {
            drop{}
        }
        if ([message] =="[Footer]") {
            drop{}
        }   

        grok {
            match => { "message" => "%{DATA}OID:\s*%{NUMBER:oid} %{DATA}" }     
            tag_on_failure => []
        }

        grok {
            match => { "message" => "%{DATA}Message ID:\s*%{WORD:messageId} %{DATA}" }      
            tag_on_failure => []
        }

        csv{
            columns => ["dateTime", "threadOrJob", "requestId", "logLevel", "messageType", "message", "UC", "RequestId", "DurationMs", "oid"]
            separator => "; "
        }   

        if ([message] == "BSS_INTERFACE: Perf") {
            mutate {
                add_field => [ "meter", "METER_DF_BO" ]
            }
        }

        multiline {
            pattern => "(^.+Exception: .+)|(^...at.+)|(^...---.+---)"
            what => "previous"
            }

        date {
          match => [ "dateTime" , "yyyy-MM-dd HH:mm:ss,SSS"]      
        }
    } else {

        mutate {
            add_field => [ "meter", "METER_GUI_BO" ]
        }
        csv {
            columns => ["userLogin", "userName", "requestDateBegin", "requestTimeBegin", "serverName", "requestID", "ucCode", "pageUrl", "pageUrlQuery", "dbDuration", "serverDuration", "clientDuration", "clientDuration1", "clientDuration2", "clientDuration3", "clientDuration4", "column17", "column18", "column19", "column20", "userCode", "subjectCode", "objectOID", "objectFullCode", "isAsync", "isError", "viewAction", "column28", "column29", "column30", "column31", "column32", "column33", "column34"]
            separator => ";"
        }
        mutate {
            add_field => [ "dateTime", "%{requestDateBegin} %{requestTimeBegin}"]
        }

        ruby { code => "
            event['dbDuration'] = event['dbDuration'].to_s.gsub(',','.').to_f * 1000;
            event['serverDuration'] = event['serverDuration'].to_s.gsub(',','.').to_f * 1000;
            event['clientDuration1'] = event['clientDuration1'].to_s.gsub(',','.').to_f * 1000;
            event['clientDuration2'] = event['clientDuration2'].to_s.gsub(',','.').to_f * 1000;
            event['clientDuration3'] = event['clientDuration3'].to_s.gsub(',','.').to_f * 1000;
            event['clientDuration4'] = event['clientDuration4'].to_s.gsub(',','.').to_f * 1000;

            event['clientDuration'] = (event['clientDuration1'] || 0) + (event['clientDuration2'] || 0) + (event['clientDuration3'] || 0) + (event['clientDuration4'] || 0);

            event['DurationMs'] = (event['clientDuration'] || 0) + (event['serverDuration'] || 0);
        " 
        }

        if ([viewAction]) {
             aggregate {
                 task_id => "%{requestID}"
                 code => "
                    map['dbDuration'] = event['dbDuration'];
                    map['serverDuration'] = event['serverDuration'];
                    map['objectOID'] = event['objectOID'];
                    map['objectFullCode'] = event['objectFullCode'];
                    map['viewAction'] = event['viewAction'];
                 "
             }
             drop{}
        }

        if (![viewAction]) {
             aggregate {
                 task_id => "%{requestID}"
                 code => "
                    event['DurationMs'] = (map['serverDuration'] || 0) + (event['clientDuration'] || 0);

                    event['dbDuration'] = map['dbDuration'];
                    event['serverDuration'] = map['serverDuration'];
                    event['objectOID'] = map['objectOID'];
                    event['objectFullCode'] = map['objectFullCode'];
                    event['viewAction'] = map['viewAction'];
                 "
                 end_of_task => true
                 timeout => 120
             }
        }

        mutate {
            convert => { "DurationMs" => "integer" }
        }

        date {
          match => [ "dateTime" , "dd.MM.yyyy HH:mm:ss,SSS"]      
        }
    }
}

output {
  elasticsearch {   
    host => "192.168.1.7"  
    cluster => "nbselasticcluster"
    protocol => "http" 
    index => "logstash-nbs-damas-%{+YYYY.MM.dd}" 
  }
  stdout {
    codec => rubydebug
  }
}

elvarb commented 9 years ago

Could you check out how the total thread count is over time?

mzubal commented 9 years ago

It is visible in the bottom right corner on both screenshots: thread count remains practically constant over time.

coffeepac commented 9 years ago

I'm seeing the same behavior on win2k8r2 + logstash 1.5.4 + jre 1.8.0_60. file input and kafka output. going to try downgrading to logstash 1.5.0 and see if that fixes things.

EDIT: its better but I'm still seeing an inexorable climb in usage as reported by perfmon

jsvd commented 9 years ago

This problem as been identified as a leak in jruby >= 1.7.20 that started shipping in 1.5.1 More info on this https://github.com/jruby/jruby/issues/3446

mzubal commented 9 years ago

Thanks to all of you for digging this out. Are there any plans to fix this in 1.X or just 2.1?

suyograo commented 9 years ago

@mzubal this will be backported to 1.5

suyograo commented 8 years ago

Fixed in https://github.com/jruby/jruby/issues/3446. LS version 1.5.6 and 2.1.0

elastic / logstash

Possible memory issues with logstash 1.5.1 and higher (1.5.0 is OK) #3754