fluent / fluentd

Fluentd: Unified Logging Layer (project under CNCF)
https://www.fluentd.org
Apache License 2.0
12.91k stars 1.34k forks source link

Unable to send correct timestamp to ES #2718

Closed flowl closed 4 years ago

flowl commented 4 years ago

Having times like those in the logs: 02/Dec/2019:12:59:56 +0100 I am not able to turn it into a proper elastic search format.

According to the docs, time_format is the input format and types with the declaration below is the paramter which fluentd converts to (output).

# Apache log DateTime
time_format %d/%b/%Y:%H:%M:%S %z

# ES DateTime
types @timestamp:time:%Y-%m-%dT%H:%M:%S.%L%z

Which produces wrong results in ES:

@timestamp | Jan 19, 1970 @ 06:34:24.410

Whereas this works:

index_name fluentd.${tag}.%Y%m%d

The index are in the past (time from the logs), which is as expected.

Full config, using fluentd 1.7, ES7.x, elastic plugin and kibana in Docker

<source>
  @type tail
  path /logs/error.log
  pos_file /logs/error.log.pos
  tag apache.error
  <parse>
    @type regexp
    # IPv4 only!
    expression ^\[(?<@timestamp>.*?)\] \[(?<level>.*?)\] \[pid (?<pid>\d*?)\](?: \[client (?<client>.*?)\:.*?\])? (?<message>.*)$
    time_key @timestamp
    keep_time_key true
    time_format %a %b %d %H:%M:%S.%N %Y
    types message:string, pid:integer, @timestamp:time:%Y-%m-%dT%H:%M:%S.%L%z
  </parse>
</source>

<match apache.**>
  @type elasticsearch
  host elasticsearch
  port 9200
  user elastic
  password changeme

  # This works:
  index_name fluentd.${tag}.%Y%m%d

  include_tag_key true
  tag_key @log_name

   <buffer tag, time>
     @type memory
     timekey 3600
   </buffer>
</match>

starting fluentd-1.7.4 pid=6 ruby="2.6.5" spawn command to main: cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--under-supervisor"] gem 'fluent-plugin-elasticsearch' version '3.7.1' gem 'fluent-plugin-multi-format-parser' version '1.0.0' gem 'fluentd' version '1.7.4'

flowl commented 4 years ago

I must say, there are several functions that don't work.

Example:

Feed a log line with the DateTime 08/Dec/2019:11:00:00 +0100.

This config

<match apache.**>
  @type stdout
  <format>
    @type json
    time_type string
    time_format %Y-%m-%dT%H:%M:%S.%L%z
  </format>
</match>

Has a JSON output of ... "@timestamp":"08/Dec/2019:11:00:00 +0100" ...

Which apparently is different from what I have declared in the formatter section.

Then, the types thing is not working either and the docs have different syntax regarding time:

https://docs.fluentd.org/configuration/parse-section

date:time:%d/%b/%Y:%H:%M:%S %z # for string with time format

https://docs.fluentd.org/configuration/config-file#supported-data-types-for-values

time type: the field is parsed as a time **duration**.    # (A number with a single letter appended?!)

When setting time_type to float, the output is still "08/Dec/2019:11:00:00 +0100.

When setting types to @timestamp:time:%Y-%m-%dT%H:%M:%S.%L%z, the output is null. When setting types to @timestanp:time:float the output is 8. It doesn't make any sense at all.

It is totally unclear which configuration is for reading/input and which configuration is for writing/output.

Please kindly guide me how to format the output datetime correctly.

ganmacs commented 4 years ago

First, Please follow the contribution guide. please do not delete the template https://github.com/fluent/fluentd/blob/bfc39b5dd847253efeb849bf5e67836eaf3ef428/CONTRIBUTING.md

Which produces wrong results in ES: @timestamp | Jan 19, 1970 @ 06:34:24.410

it happens when time is null. can you check the regex in types is correct?

Has a JSON output of ... "@timestamp":"08/Dec/2019:11:00:00 +0100" ...

This is what we expected. time_format in format section only affects time which is used in fluentd internally not field value(in this example @timestamp).

flowl commented 4 years ago

@ganmacs Thank you for your reply.

There is no regexp in types it's a time formatting from https://docs.ruby-lang.org/en/2.4.0/Time.html#method-i-strftime as described in https://docs.fluentd.org/configuration/parse-section

Can you tell me how to format the output datetime, or the datetime getting sent to ES? I could prepare a full docker composed environment if it helps.

flowl commented 4 years ago

Tried the following from https://groups.google.com/forum/#!topic/fluentd/kkLLYkeCfTI with the following error, maybe that helps:

<filter apache.**>
  @type record_transformer
  enable_ruby
  <record>
    formatted_time ${ require 'date'; DateTime.parse(@timestamp).iso8601(3) }
  </record>
</filter>
[warn]: #0 dump an error event: error_class=RuntimeError error="failed to expand ` require 'date'; DateTime.parse(@timestamp).iso8601(3) ` : error = no implicit conversion of nil into String" location="/usr/local/lib/ruby/gems/2.6.0/gems/fluentd-1.7.4/lib/fluent/plugin/filter_record_transformer.rb:310:in `rescue in expand'" tag="apache.access" time=2019-12-08 10:00:00.000000000 +0000 record={"serveraddress"=>"xxx", "clientaddress"=>"xxx", "user"=>"-", "@timestamp"=>"08/Dec/2019:11:00:00 +0100", "method"=>"GET", "url"=>"/TEST", "status"=>401, "bytes"=>2375, "referrer"=>"-", "agent"=>"TEST"}

At least it proves the reading of the timestamp and the patterns is working (see "time" in the warn message). It's just the transforming/outputting. I need ElasticSearch Datetime-format.

flowl commented 4 years ago

The error message above also lead me to try the variable time instead of @timestamp.

I got it working with this.

<filter apache.**>
  @type record_transformer
  enable_ruby
  <record>
    @timestamp ${time.strftime('%Y-%m-%dT%H:%M:%S%z')}
  </record>
</filter>

It is unclear why to use the variable time when in my record it's called @timestamp.

ganmacs commented 4 years ago

Doesn't this work? https://github.com/uken/fluent-plugin-elasticsearch/tree/db5d0d3b376865608b8fba3d77b11d90a01036ba#time_key_format

And elasticsearch plugin adds @timestamp field dynamically(https://github.com/uken/fluent-plugin-elasticsearch/tree/db5d0d3b376865608b8fba3d77b11d90a01036ba#time_key). so it seems that you don't need to handle it by yourself.

flowl commented 4 years ago

@ganmacs

time_key_format ... This parameter only has an effect when logstash_format is true.

I am not using logstash format.

I don't want the ingest time added to @timestamp, I want the timestamp from the logs themselves.

f0o commented 3 years ago

@flowl did you solve this?

I'm having a similar issue where I want the event-timestamp from the forward input to be used as time-key for ES instead of having the output generate a new one on flush as that falsifies the logs

flowl commented 3 years ago

Hi @f0o , in 2019 fluentd had a lot of bugs and a very bad documentation, especially regarding the different timestamps and formats. There is a great chance in things having improved now. Unfortunately, I can't tell because I stopped working for this specific project and we use different setups today.

jkulak commented 2 years ago

I was able to solve (kind of) similar issue with fluentd-plugin-elasticsearch and it's option to include_timestamp: https://github.com/uken/fluent-plugin-elasticsearch#include_timestamp.

<match something.**>
    @type elasticsearch
    scheme https
    ca_file /ca.crt
    user elastic
    password *******
    host es01
    port 9200
    include_timestamp true # <--- here
    index_name grabtracksp-listeners
    type_name fluentd
</match>