fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.89k stars 1.59k forks source link

Allow multiple time_format keys #385

Open epcim opened 7 years ago

epcim commented 7 years ago

It would be worth support multiple. The reasoning is that on a complex system it might be quite complex to force all components on all systems to use one standardized format. Sure - it's the goal. But even if you assume configuration management - you will have to deal with multiple time format's - on metadata side in minimum.

Time_Format %b %d %H:%M:%S
Time_Format %Y-%m-%dT%H:%M:%S

Additionally, in practice it would be handy to be able to strip some part of the time value, as on example below, I basically can't encode the '+00:00' in strftime format options (IMHO). This can be managed by regexp, but usually on regexp you want only to find the field where time is, not to strip it partially.

Time_Format %Y-%m-%dT%H:%M:%S+00:00

common pattern of golang apps: (first two lines come from 3rd party libraries the app is using, last line is uber/ZAP log format application is using natively.

2019-02-15 16:30:15.549659 I | etcd watcher: sent err {Type:ERROR ResourceVersion:0 Object:etcdserver: mvcc: required revision has been compacted}
2019-02-15 16:30:15.550082 I | watchserver: failed to receive watch request from gRPC stream ("rpc error: code = Canceled desc = context canceled")
2019-02-15T16:30:15.550Z        INFO    server/audit_handler.go
epcim commented 7 years ago

BTW, what would be a correct regex for this "2017-07-03 13:06:27.606" date?

edsiper commented 7 years ago

@epcim

%Y-%m-%d %H:%M:%S.%L
epcim commented 6 years ago

Update: for pacemaker logs you can for example have such log as below, I have made newline between first and other log lines. Test:

http://rubular.com/r/SicOPNkBmj

^((?<component2>\w+):){0,1}\s*(?<log_time>[^ ]* {1}([^ ]*){0,1} [^ ][\d:]+)\s*((?<severity2>\w+):\s+(?<process_name>\w+)){0,1}\s*\[(?<pid>\d+)\](:){0,1}\s*((?<node>[\-\w]*)\s*(?<component>\w*):\s+(?<severity>\w+):){0,1}\s+(?<message>.*)$

Log example (bit simplified)

lrmd: 2018/01/25 09:24:57 INFO: rabbitmq[49262]: get_monitor(): get_monitor function ready to return 8

Jan 25 09:25:00 [4645] d98-f2-b3-c9-6f-41 cib: info: cib_process_request: Forwarding cib_modify operation for section nodes to all (origin=local/crm_attribute/4)
Jan 25 09:25:00 [4645] d98-f2-b3-c9-6f-41 cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=d98-f2-b3-c9-6f-4
1/crm_attribute/4, version=0.312.0)

Note: the above issue, could be resolved by reading the log file by two inputs/parsers independently (possible, without multiline feature). However will look much better as component2 it's junk anyway.

epcim commented 5 years ago

@edsiper may I ask you again to review, whether would be possible on a place where time format is evaluated and fails to consider evaulate against alternative time format. I expect we can allow multiple records of Time_Format in the parser config.

HontoNoRoger commented 3 years ago

I thought that would already be the case? I see there are two Time_Format entries for the default syslog-rfc3164 parser already. But according to my tests in #2967 only the first one is actually used and the second one gets silently ignored.

This functionality would be an easy way to solve my issue above.

payam54 commented 3 years ago

Hi, any update for this? Having this feature would be useful in many cases. For example, we have a Jetty server that runs multiple WARs from other parties. Some support ISO8601 e.g:

2021-03-23 23:48:56,742 INFO  Thread: qtp1497973285-14 - blah blah

And some others including Jetty itself have a slightly differnt timestamp (notice that there is a dot . instead of comma , for milliseconds:

2021-03-23 23:48:04.989:INFO:oejs.Server:main: Started @31566ms
bungoume commented 3 years ago

Are there any plans to introduce this feature?

My case is general. I would like to handle the problem of decimal places being omitted when microseconds are zero.

    Time_Format %Y-%m-%dT%H:%M:%S.%L%z
    Time_Format %Y-%m-%dT%H:%M:%S%z

For example, Python's standard datetime function isoformat results in the following.

>>> import datetime
>>> datetime.datetime(2021,11,22,3,44,55,microsecond=6789,tzinfo=datetime.timezone.utc).isoformat()
'2021-11-22T03:44:55.006789+00:00'
>>> datetime.datetime(2021,11,22,3,44,55,microsecond=0,tzinfo=datetime.timezone.utc).isoformat()
'2021-11-22T03:44:55+00:00'

%Y-%m-%dT%H:%M:%S.%L%z will throw an error in this particular case.

$ echo "time:2021-11-22T03:44:55.006789+00:00       method:GET      status:200" >>  access.log
$ echo "time:2021-11-22T03:44:55+00:00       method:GET      status:200" >> access.log

[0] access.apache: [1637552695.006789000, {"method"=>"GET", "status"=>"200"}]
[2021/10/15 04:33:42] [error] [parser] cannot parse '2021-11-22T03:44:55+00:00'
[2021/10/15 04:33:42] [error] [parser:ltsv_iso8601_parser] Invalid time format %Y-%m-%dT%H:%M:%S.%L%z
[0] access.apache: [1634272422.520534812, {"log"=>"time:2021-11-22T03:44:55+00:00   method:GET  status:200"}]
BertelBB commented 2 years ago

What I've been doing is applying multiple parser to my time field, but that is causing flb to log a lot of warnings when the time format does not match the value of time.

Is this bad practice? Should I instead be more verbose with my matching and only apply the correct parser to each match?

[FILTER]
    name         parser
    alias        unparsed_time_field
    match        *
    key_name     time
    parser       utc-date-time
    parser       iso8601-date-time
    parser       iso8601-date-time-offset
    parser       float-time
    reserve_data on

Warning messages I get from flb

[2021/12/14 10:36:20] [ warn] [parser:iso8601-date-time] invalid time format %Y-%m-%dT%H:%M:%S for '1638665221.836'
artheus commented 1 year ago

This is not actually needed, as you can all do this:

# fluent-bit.conf

# other config goes here

[FILTER]
    name               parser
    alias                 message-format-parser
    match              *
    key_name       log
    reserve_data  on  
    parser              json
    parser              java
    # and probably more parsers

[FILTER]
    name               parser
    alias                 time-format-parser
    match              *
    key_name       time
    preserve_key on
    reserve_data  on
    parser              time-iso8601
    parser              time-java

# other config goes here too
# parsers.conf

# Message format parsers

[PARSER]
    Name            json
    Format          json
    Time_Keep   On
    Time_Key     time
    # no Time_Format key

[PARSER]
    Name            java
    Format          regex
    # Probably change the time regexp to something less strict
    Regex            /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}\.\d+)\s+(?<level>[A-Z]+)\s+.*?\s+\[(?<thread>[^\s]+)\].*/
    Time_Keep   On
    Time_Key      time
    # no Time_Format key

# Time format parsers

[PARSER]
    Name        time-iso8601
    Format      regex
    Regex       /^(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{1,2}:\d{1,2}:\d{1,2}\.\d+(Z|[\+\-]\d{2}(:\d{2})?))/
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L%z

[PARSER]
    Name        time-java
    Format      regex
    Regex       /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}\.\d{,9})/
    Time_Key    time
    Time_Format %Y-%m-%d %H:%M:%S.%L

This way you won't get any Error logs about the time format not being suitable, and you can add as many time formats you'd like. That is how I've done it, works well.