elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.2k stars 3.5k forks source link

Multiline filter failure on output to ElasticSearch - seems to be location dependent within an input file #3743

Open stevedearl opened 9 years ago

stevedearl commented 9 years ago

Hi Logstash,

First off, I'm relatively new to Logstash/Elasticsearch so apologies if this is either a known issue or a stupid error on my part (more likely to be the latter, I think ;)).

I've got an odd issue with Logstash/ElasticSearch. I have an input file containing 3,000 log entries each of which spans multiple lines. When I read the file with Logstash into ElasticSearch with the multiline filter enabled I get a grokparsefailure on one of the records, basically because Logsearch has split one of the multiline records in half (the second half doesn't conform to the grok pattern so I get an error).

I don't know why I get this error, which happens consistently. However, from further testing I have noticed the following:

  1. The error always occurs to the record at a particular point in the file (the record at line 185). This occurs whatever the record at this line is. Reordering the data within the test file doesn't change this.
  2. If I disable the output to ElasticSearch and simply log to an output file the error does not occur at all. In this case all 3,000 records get processed correctly.

My Logstash configuration file is shown below:


input { file { path => "C:/Applications/Elastic/BWLogReader_Test/test_logs/test_multiline_3.log"

    type => "tibco_bw"  # a type to identify those logs (will need this later)
    codec => plain { charset => "ISO-8859-1" }
    start_position => "beginning"
    sincedb_path => "C:\Applications\Elastic\BWLogReader_Test\logstash_sincedb"
}

}

filter { multiline { patterns_dir => "C:\Applications\Elastic\BWLogReader_Test\logstash_patterns" pattern => "^%{TIBCO_DATESTAMP} " negate => true what => "previous" } if [type] == "tibco_bw" {

Extract main fields

    grok {
        patterns_dir => "C:\Applications\Elastic\BWLogReader_Test\logstash_patterns"
        match => [ "message", 
                    "%{TIBCO_DATESTAMP:log_time}(\sGMT\s%{INT})?\s%{BW_APP_NAME:application}-%{WORD:engine}\s%{WORD:log_level}\s[%{NOTSPACE:log_type}]\s%{GREEDYDATA:log_msg}" ]
    }
    # Extract BW Job ID
    grok {
        patterns_dir => "C:\Applications\Elastic\BWLogReader_Test\logstash_patterns"
        match => [ "log_msg", 
                    "%{GREEDYDATA}Job-%{POSINT:job_id}\s%{GREEDYDATA}" ]    
    }

    # If Job ID not found in previous grok (can happen for log lines such as RV advisory messages) generate a default field value.
    if [job_id] {
        mutate {
            add_tag => ["job_id_found"]
        }
    } else {
        mutate {
            add_field => {"job_id" => "None"}
            add_tag => ["job_id_not_found"]
        }
    }
}

# Remove the original (and now parsed) message field. Only leave it in place if we've had a parsing failure, as it then allows us to look at
# the input line which failed - useful for debugging.
if "_grokparsefailure" not in [tags] {
    mutate {
        remove_field => [ "message" ]
    }
}

# Update @timestamp field to contain the timestamp in the BW engine or Adapter log event.
date {
    match => ["log_time", "YYYY MMM dd HH:mm:ss:SSS", "YYYY-MM-dd HH:mm:ss,SSS", "ISO8601"]
    locale => "en"
    target => "@timestamp"
}

}

output {

elasticsearch {

host => localhost

}

file {
    path => "C:/Applications/Elastic/BWLogReader_Test/logstash_output.txt"
}

}


A sample of the input file I was using is shown below. I couldn't find a way to attach the actual file to this issue (perhaps someone can tell me how to do that):


2015 Aug 16 09:35:52:000 GMT +1 BW.APPLICATION-ENGINE Info [BW-User] Job-123456789 Log Entry 1 11 12 13 14 15 16 17 2015 Aug 16 09:35:52:001 GMT +1 BW.APPLICATION-ENGINE Info [BW-User] Job-123456789 Log Entry 2 21 22 23 24 25 26 27 2015 Aug 16 09:35:52:002 GMT +1 BW.APPLICATION-ENGINE Info [BW-User] Job-123456789 Log Entry 3 31 32 33 34 35 36 37 2015 Aug 16 09:35:52:003 GMT +1 BW.APPLICATION-ENGINE Info [BW-User] Job-123456789 Log Entry 4 41 42 43 44 45 46 47


I didn't see the error until I got up to Log Entry #24.

I've also attached a image from BeyondCompare which shows the comparison output with the ElasticSearch output enabled and disabled.

image

Hopefully someone can tell me if this is a known issue, or what I'm doing wrong. For information I'm using: Logstash 1.5.3 Elasticsearch 1.7.1

Cheers, Steve

stevedearl commented 9 years ago

Sorry - I should probably have attached my custom patterns too, for completeness. They're provided below: ############################### TIBCO_DATESTAMP ((%{YEAR}[\s/-]%{MONTH}[\s/-]%{MONTHDAY}[\s]%{TIME})|(%{TIMESTAMP_ISO8601}))

BW Application Name

BW_APP_NAME (BW.%{WORD}) ###############################

stevedearl commented 9 years ago

Hi,

A quick update on this. I retested with the Multiline codec rather than filter and could not recreate this issue.

Unfortunately, the Multiline codec doesn't seem capable of flushing the last entry within the input file to Elasticsearch/output file - but this seems to be a known limitation of the codec (and people suggest switching to the filter instead...;().

Cheers, Steve

henrik-nilsson commented 8 years ago

Hi, any updates or work-arounds for this bug?

I have just started using Logstash today and have encountered this exact symptom. Pretty disappointing.

Everything is fine except when using Elasticsearch as output, then the one grok parse failure comes. I'm using Logstash version 1.5.4.

-Henrik