elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.18k stars 3.5k forks source link

logstash 1.5.0 not parsing @timestamp correctly #2929

Closed TinLe closed 8 years ago

TinLe commented 9 years ago

Yes, I know I am playing with fire. Logstash v2.0 (fresh pulled from git today 03/31/15 around 11am PST). Using configs that work in LS v1.4.2, failed with errors like these in LS v2.0.0-dev.

{:timestamp=>"2015-04-01T00:40:40.867000+0000", :message=>"Unrecognized @timestamp value, setting current time to @timestamp, original in _@timestampfield", :value=>"1427848229276", :level=>:warn}

{:timestamp=>"2015-04-01T00:40:40.868000+0000", :message=>"Failed parsing date from field", :field=>"@timestamp", :value=>"2015-04-01T00:40:40.866Z", :exception=>#<RuntimeError: Invalid UNIX epoch value '2015-04-01T00:40:40.866Z'>, :level=>:warn}

Here is the json record that caused the above error, mildly scrubbed:

{"header":{"EventHeader":{"container":"node.js","environment":"ei3","eventType":"","eventVersion":0,"guid":"ZZZB2ABF43421DAAA2705A92CCEFDA7A","instance":"11","nano":276000000,"server":"abc1-app1250.stg.example.com","service":"tablet-fe","time":1427848229276,"version":""}},"@fields":{"level":"DEBUG","memberId":0,"requestId":"no_rid","osName":"no_name","osVersion":"0.0.0","clientVersion":"0.0.0","ts":"2015-04-01T00:30:29.276Z","network":null},"@message":"LCache: Updating L Cache...","@server_host":"abc1-app1250.stg.example.com","@source":"tab-fe","@tags":[],"@timestamp":"2015-04-01T00:40:40.867Z","@version":"1","tags":["timestampparsefailure"],"@timestamp":1427848229276,"type":"mob-server","logstash_node":"abc1-app904.stg.example.com"}

Here is the filter:

filter { date { match => [ "@timestamp", "UNIX" ] remove_field => [ "@timestamp" ] }

mutate { remove_field => ["command"] rename => ['host', 'logstash_node'] convert => [ "[header][EventHeader][guid]", "string" ] } }

untergeek commented 9 years ago

I am confused by your date filter, and this JSON record you included.

match => [ "@timestamp", "UNIX" ]

...tells the date filter to overwrite the @timestamp field with the @timestamp field's value, provided that it's a UNIX (epoch) value. But then you delete the @timestamp field. What was accomplished by this? It's the remove_field bit that makes no sense. I presume that you're ingesting JSON records, and the @timestamp field is one of them, and it's not in ISO8601, but rather epoch time, hence this filter. Your JSON record can't even be valid JSON if it has two fields with @timestamp in the same level within it.

I also see fields that indicate a Logstash V0 message schema, so I'm curious as to the why about that.

TinLe commented 9 years ago

I've tried the filter with and without the match, and also without the remove_field. All four combinations ;)

I still get the same error messages.

Once I get on VPN, I'll do another run, with all date filters and remove_field gone.

My incoming json data only have one @timestamp field. It is in epoch time, hence the date filter. It look like (from the error message) that logstash does not recognize the incoming @timestamp field as epoch and renamed it to _@timestamp, then insert its own @timestamp field. Hence my bug report.

The @timestamp field that logstash v2.0.0 inserted seem to be in ISO8601 format (see the error message I included above).

As for the Logstash V0 message schema, this is the json I get out of logstash output { stdout {} }. I am guessing that Logstash is the one responsible for creating that format. I am pretty sure I did not create it.

untergeek commented 9 years ago

You included this error first. I didn't look at it, but it confirms what I said:

{:timestamp=>"2015-04-01T00:40:40.867000+0000", :message=>"Unrecognized @timestamp value, setting current time to @timestamp, original in _@timestampfield", :value=>"1427848229276", :level=>:warn}

original in _@timestampfield means you are sending a value to Logstash with an @timestamp field already, but because it's not in ISO8601, Logstash is automatically using the current time for @timestamp, and moves the @timestamp field you sent to _@timestampfield.

The next message you included explains the result:

{:timestamp=>"2015-04-01T00:40:40.868000+0000", :message=>"Failed parsing date from field", :field=>"@timestamp", :value=>"2015-04-01T00:40:40.866Z", :exception=>#, :level=>:warn}

Since Logstash moved your @timestamp field to _@timestampfield, your date match statement doesn't work because it's trying to match a UNIX timestamp, but @timestamp is now ISO8601. This is why the date match did not work, no matter what you tried.

Also, the number you sent (which is now _@timestampfield) is too big for a UNIX match. That looks like UNIX_MS (epoch + milliseconds).

Based on what I'm seeing, if your config worked in 1.4.2, it was probably a bug that has since been fixed.

Regarding the schema, Logstash hasn't used @fields, @message, @tags, or @source since before Logstash 1.2. This implies that whatever is sending your original JSON this is generating these fields. Is it coming from a generator that is using an old schema?

TinLe commented 9 years ago

These are the same filters I've been using since ES v0.20 and whatever LS version that was released around same time.

So they have been working up till ES v1.4.4 and LS v1.4.2. I think that it is bad to break something that has been working for so long.

I have simplified my config to the bare minimum.

logstash config:

input {
  pipe {
    type => "mo-server"
    command => "/export/bin/kafka-console-consumer.sh \
      --formatter com.linkedin.avro.KafkaMessageJsonWithHexFormatter \
      --property schema.registry.url=http://schema-vip.example.com:10252/schemaRegistry/schemas \
      --autocommit.interval.ms=60000 \
      --zookeeper zk-kafka.stg:12913/kafka-cluster-aggregate \
      --topic log_stash_event \
      --group log_stash_event"
    codec => json
  }
}

filter {
}

output {
   stdout { codec => json }
}

and here is the log output:

{:timestamp=>"2015-04-01T05:13:53.336000+0000", :message=>"Unrecognized @timestamp value, setting current time to @timestamp, original in _@timestampfield", :value=>"1427865166763", :level=>:warn}

lots of the above lines....

the json record output to stdout

{"header":{"EventHeader":{"container":"node.js","environment":"ei2","eventType":"","eventVersion":0,"guid":"C91C8394F999984B4A0D6D1387AFAC7F","instance":"5","nano":13000000,"server":"app674.example.com","service":"table-fe","time":1427865198013,"version":""}},"@fields":{"level":"DEBUG","memberId":0,"requestId":"no_rid","osName":"no_name","osVersion":"0.0.0","clientVersion":"0.0.0","ts":"2015-04-01T05:13:18.013Z","network":null},"@message":"MoServer request qZOTvn3ES2aHsHBTM05F : { route: '/heartbeat',\n  controller: 'health',\n  action: 'heartbeat',\n  query: {},\n  path: '/heartbeat' }","@server_host":"app674.example.com","@source":"table-fe","@tags":[],"@timestamp":"2015-04-01T05:13:53.340Z","@version":"1","tags":["_timestampparsefailure"],"_@timestamp":1427865198013,"type":"mo-server","logstash_node":"app904.example.com"}

So as you said, I realized that the incoming json has a @timestamp field that is in UNIX_MS epoch format.    So I've tried the following filter:

   date {
     match => [ "@timestamp", "UNIX_MS" ]
     remove_field => [ "@timestamp" ]     # <<< with and without this line
   }

Still got the above error messages in the output log.

The behavior has changed from Logstash v1.4.2 and v2.0.0 that break existing functionalities.
TinLe commented 9 years ago

Digging a little further, it seem that the date filter never had a chance. Logstash @timestamp parser get to go first. Look like the rest of the data does get parsed by LS correctly, just now my log file is full of these warning messages. That of course also hurt performance. So annoying.

Poking around in logstash/event.rb, I managed to fix this warning. It's a hack though, since the filters still go after logstash @timestamp parsing. So filters are useless for certain fields that logstash claimed.

untergeek commented 9 years ago

The @ sign indicates a protected field name for Logstash. Your sender should never have sent a non-ISO8601 timestamp in the @timestamp field. The fact that this didn't break in previous versions of Logstash is a bug, as mentioned previously.

untergeek commented 9 years ago

And I do not recommend hacking event.rb to get your desired result. You should change the format being sent instead.

TinLe commented 9 years ago

I consider it a bug when filters does not work as expected. I have not look at logstash code for a while, so will need to familiarize myself before tackling fixing this.

I have no control over the data being sent to me, so I am depending on filters to "fix" things. That was the whole premise of using logstash. The ability to transform data before ingesting into ES. If I can't do that, logstash become less useful. Long time ago, I was using sed in the input pipeline to fix these issues. I can do so again, but once more, logstash become less useful and I'll have to look at using something else for ingesting data into ES.

Fixing bugs is good and I agree with that. But restricting what the software can do to the point of doing it one method and only that one make it much less useful.

untergeek commented 9 years ago

A Workaround (without sed)

Since you have no control over the data being sent, you will be obliged to not ingest with the JSON codec, but with the plain codec. After that you could use the mutate filter operation gsub, and then use the JSON filter to import the fields. With the gsub, you could rename @timestamp to timestamp via a fairly straightforward regular expression, no sed necessary.

Once done, you could use a filter like this:

date {
  match => [ 'timestamp', 'UNIX' ]
  remove_field => 'timestamp'
}

A Better Workaround

Of course, you could just roll with the changes to the @timestamp field and do this:

date {
  match => [ '_@timestampfield', 'UNIX' ]
  remove_field => '_@timestampfield'
}

This is why the field renaming behavior exists: So you can work-around improperly formatted datestamps. It is a work-around to the previous, broken behavior.

Changes

With regards to changes in Logstash, the syntax as you had it should not have worked. Just because it worked in the past doesn't mean it wasn't broken, nor that we should have left broken behavior in place as some kind of grandfathered-in, acceptable bug. We provided a work-around instead.

Besides this, your syntax for the date filter would have removed the @timestamp field altogether, which is not supposed to happen.

I agree that changes should break as little as possible, but your configuration—which should never have worked, but did because of coding mistakes on our part—became dependent upon software bugs we did not know existed, but have since fixed, and provided a work-around for. To be clear: If sending JSON to Logstash, if you include an @timestamp field, it must be in ISO8601 format. In the past this was not enforced by Logstash, but it is now, with the result being the incorrectly formatted @timestamp field being moved to _@timestampfield, and a new @timestamp field being generated based on the current time.

As mentioned already, this is why instead of just throwing an error and dying, Logstash warns you and does the @timestamp shuffle, so you can pick up the changes and go with it.

Kafka

With the advent of the kafka plugins (input and output) for Logstash, why are you using a dedicated shell script with the pipe input?

TinLe commented 9 years ago

Thank you for the suggested workaround, I've tried that. It leave me with an ever growing log full of warnings :-( (related topic: LS need a log rotator, similar to ES).

I'll see about using the plain codec. I was using the plain codec a year or so back before switching to json codec. This is so frustrating, it just feels like the tool is fighting me as opposed to helping me.

Kafka

The kafka input plugin does not work with avro schema server, which we use and does not have the performance we need. I've also written my own kafka plugin and that work fine, but again hit a performance bottleneck. We've found that using kcc via pipe input give us very high throughput, higher than a single ES node can handle.

One of the reason I am testing LS v2.0 is to try out the kafka output plugin. If that does not work or not performant enough, I'll go the route of writing our own. We have written such a tool and I am trying to make the decision of whether we should stay with logstash or go with our tool.

So far, testing kafka output plugin on v1.5rc show that its performance is not good. Another reason I am trying v2.0dev.

untergeek commented 9 years ago

You should know that the master branch, while labeled v2.0dev, is pretty much 1.5.0RC3 right now.

suyograo commented 9 years ago

@TinLe re: kafka performance can you provide us with numbers? /cc @talevy

TinLe commented 9 years ago

I'll post the numbers in the next few days. We've completed our internal perf tests and have decided to go with our in-house written log2kafka tool instead of using logstash. It was at least 10x faster, smaller and lighter weight.

suyograo commented 9 years ago

@TinLe thanks, we'd be very interested in those numbers and sample LS configuration. This will help us make the kafka output faster

TinLe commented 9 years ago

The following is only for the kafka output plugin. I do not have recent perf numbers for the input plugin, but our previous testing shown that the default plugin was too slow for our needs.

Here is the config I used for testing logstash-output-kafka vs our own log2kafka

https://gist.github.com/TinLe/bcd53438ce7cb4131ec3

The common CLI test command is:

datagen | pv | logstash agent --filterworkers 10 --config /export/logstash-perf-config datagen | pv | log2kafka

I used logstash v1.5.0rc3 since kafka output plugin was introduced in v1.5.

On a 12 core, 64GB RAM system. We tested with and without using taskset to set CPU affinity on the process.

For logstash using kafka output plugin: 60% of one core used, RSS ~322M, throughput 2.2MB/s, JDKv1.7.0.51 58% of one core used, RSS ~486M, throughput 2.0MB/s, JDKv1.8.0.40

For kafkacat (using pipe output plugin to kafkacat): 40% of one core used, RSS of ~347M, throughput of 2.25MB/s

For log2kafka (no logstash, read from stdin, output to kafka server): 9% of one core used, RSS ~15M, throughput 2.2MB/s

NOTE: we removed -XX:UseConcMarkSweepGC from JAVA_OPTS as recommended in https://github.com/elastic/logstash/issues/2859

Also tested with and without G1GC in JDK8, not enough difference to matter for this test.

colinsurprenant commented 9 years ago

I'd like to go back to the @timestamp problem and recap the issue to make sure we all understand the problem, see what the expectations are and what we should do.

Now, as you noted, in 1.4.2 (and earlier versions) setting @timestamp to epoch in the input data would produce this:

echo "{\"@timestamp\":1427848229276}" | bin/logstash -e 'input {stdin {codec => json_lines{}}} output{stdout {codec => rubydebug{}}}'
{
    "@timestamp" => 1427848229276,
      "@version" => "1",
          "host" => "colin-mbp13r-2.local"
}

As you can see, the internal logstash Event object has a numeric @timestamp value. The fact that the Event object could end up with a non standardized @timestamp field with a Timestamp object value (previously Time object and now in 1.5+ a Timestamp object) was the potential cause of other problems downstream in the filters/outputs.

Starting in 1.5 we decided to make sure that any new Event created contained a proper Timestamp object value in the @timestamp field and accept only ISO8601 format string in input data with a @timestamp field.

And as you reported, this is what we have now:

echo "{\"@timestamp\":1427848229276}" | bin/logstash -e 'input {stdin {codec => json_lines{}}} output{stdout {codec => rubydebug{}}}'
Logstash startup completed
Unrecognized @timestamp value, setting current time to @timestamp, original in _@timestampfield {:value=>"1427848229276", :level=>:warn}
{
     "@timestamp" => "2015-04-10T11:41:56.027Z",
       "@version" => "1",
           "tags" => [
        [0] "_timestampparsefailure"
    ],
    "_@timestamp" => 1427848229276,
           "host" => "colin-mbp13r-2.local"
}
Logstash shutdown completed

Now, since it was previously possible to specify a numeric (epoc) value in the @timestamp input data field, I would not be against adding this condition in the Logstash::Timestamp#coerce method and if a numeric value is passed, we could threat it as an epoc timestamp and convert it to a proper Timestamp.

Thoughts?

TinLe commented 9 years ago

For our needs, I would like to have previous functionality back. Or a way to tell logstash that yes, we really, really want to override an internal field with our own. Make it obvious that we are deliberately chosing to do something and it's not accidental.

Maybe an override filter.

filter { override { } }

That way, the default is is as before, but flexibility for special cases.

untergeek commented 9 years ago

An override filter would not fix this issue, as the @timestamp collision happens in the input stage.

I do appreciate that this breaks your usage, but the reasons we went to this solution are to fix larger problems. The coerce idea put forward by @colinsurprenant is an excellent compromise that would automatically convert the epoch time to the appropriate ISO8601 value, saving you the step of having to do that in the date filter anyway. I think this should be acceptable because it removes the need for you to do anything at all, except remove the date filter.

colinsurprenant commented 9 years ago

dealing consistently with timestamps is central in logstash so I do not support the idea of allowing overriding @timestamp with anything. I do not see any benefits for doing that, a lot depends on @timestamp being a proper Logstash::Timestamp object.

@TinLe would allowing the coercion of numeric values in @timestamp as epoch into a proper Logstash::Timestamp object solve your problem?

TinLe commented 9 years ago

@colinsurprenant Yes, coercion of numeric values in @timestamp as epoch will work for me. Thanks.

talevy commented 9 years ago

@TinLe @colinsurprenant I went ahead and added this coercion to Timestamp. One thing to note is that this PR adds the ability to coerce Numerics into Timestamps while assuming that the input is in seconds.

I notice that the _@timestampfield field passed in the example is actually in milliseconds.

{:timestamp=>"2015-04-01T00:40:40.867000+0000", :message=>"Unrecognized @timestamp value, setting current time to @timestamp, original in _@timestampfield", :value=>"1427848229276", :level=>:warn}

does this sound like the expected solution? I apologize if I missed anything in the discussion, tried to catch up with all of it.

reference: https://github.com/elastic/logstash/pull/2998

please let me know! thanks!

jordansissel commented 9 years ago

@TinLe First, some background: For many years, Logstash has had a standard schema that defines @timestamp field as an ISO8601 time string (with subsecond precision optional and timezone required). For some recent behavior change, we added stronger validation to @timestamp when users set it, so passing a numeric value as @timestamp is not correct, and you will see the error message that you are seeing.

Can you help us understand why you chose @timestamp as your field name? Logstash picked this '@' prefix to avoid any conflicts with schemas others may have used. Today is the first time I've ever seen a conflict like this where someone is using Logstash and uses a conflicting data type for the @timestamp field. We want to help you make this work, but knowing why your field name is chosen the way it is, and what your constraints are, will help us decide how to improve this.

untergeek commented 9 years ago

@jordansissel if you scroll back to https://github.com/elastic/logstash/issues/2929#issuecomment-88539168, it appears that @TinLe has no control over what's being sent. It appears to be some hold-over from the v0 schema of Logstash. @TinLe was handed this and is doing his best with it, from what I can tell. We're trying to maintain our @timestamp changes, but making the coercion change as a bridge. If there's ever any chance he can get the people on his side to change/fix the v0 schema that's being sent, that'd be better, or course.

TinLe commented 9 years ago

@jordansissel I don't have control over these data. It's a really long story how it came about. I am hoping to get the app generating data retire this year, but that's own by another team. Until then I am stuck with what it generates :-(

@talevy Glad you asked! The time is indeed in UNIX ms and not seconds. Will this be a problem?

jordansissel commented 9 years ago

@TinLe ok that's great information! I think we can figure out a way to patch Logstash (or maybe better, a custom plugin solution) that will help you survive happily until that app is retired :)

jordansissel commented 9 years ago

(for context, my gut feeling is that you could use a custom codec, instead of the json codec, to kind of work around these problems)

TinLe commented 9 years ago

@jordansissel Funny enough, I was using oldlogstashjson codec for a while, before moving to json codec. I had some parsing problems with the old codec. If those were fixed, I can move back to using it again.

Let's see, it was this: https://logstash.jira.com/browse/LOGSTASH-1727

Or as you suggest, create my own codec based on one of the existing one?

colinsurprenant commented 9 years ago

@talevy maybe we could use a pragmatic approach and bet on the fact that any epoch style timestamps not too far back or away in time will be millisecond if bigger than, let's say, 9000000000 ? if we do that we can just divide by 1000 and be done with it... no?

talevy commented 9 years ago

@colinsurprenant I thought about that, but held off because what if someone WANTED to record times in the future for some weird case.

colinsurprenant commented 9 years ago

@TinLe @jordansissel custom codec totally possible, here's a modified json codec decode method that should work with your data (I haven't tested it, just a on-the-fly suggestion/example):

def decode(data)
  data = @converter.convert(data)
  begin
    hash = LogStash::Json.load(data)
    hash["@timestamp"] = Logstash::Timestamp.new(Time.at(hash["@timestamp"] / 1000))
    yield LogStash::Event.new(hash)
  rescue LogStash::Json::ParserError => e
    @logger.info("JSON parse failure. Falling back to plain-text", :error => e, :data => data)
    yield LogStash::Event.new("message" => data)
  end
end # def decode
colinsurprenant commented 9 years ago

@talevy right but there's future and FUTURE :P also keep in mind that this is a special condition for supporting json data generated from an older logstash version...

colinsurprenant commented 9 years ago

@TinLe do you have a working solution for the @timestamp issue?

TinLe commented 9 years ago

@colinsurprenant I have not had chance to work on this. I'll test out the codec you suggested and see if that works for me, with appropriate adjustment as needed of course.

TinLe commented 9 years ago

Not sure how to use the above codec. I just tried upgrading to logstash v1.5.2 release and bam! run into same problems.

So.... I ended up going back to my old hacks.

pipe input | sed 's/@timestamp/timestamp/g'

Sigh.

untergeek commented 8 years ago

@TinLe have you looked at this with recent Logstash versions?

wiibaa commented 8 years ago

@suyograo based on the stability of 2.x in handling @timestamp in senders (beats or json-codec) and specific usecase configurations provided above I would be in favor to close this for the same reason you raised in related PR #2998