Open epcim opened 7 years ago
BTW, what would be a correct regex for this "2017-07-03 13:06:27.606" date?
@epcim
%Y-%m-%d %H:%M:%S.%L
Update: for pacemaker logs you can for example have such log as below, I have made newline between first and other log lines. Test:
http://rubular.com/r/SicOPNkBmj
^((?<component2>\w+):){0,1}\s*(?<log_time>[^ ]* {1}([^ ]*){0,1} [^ ][\d:]+)\s*((?<severity2>\w+):\s+(?<process_name>\w+)){0,1}\s*\[(?<pid>\d+)\](:){0,1}\s*((?<node>[\-\w]*)\s*(?<component>\w*):\s+(?<severity>\w+):){0,1}\s+(?<message>.*)$
Log example (bit simplified)
lrmd: 2018/01/25 09:24:57 INFO: rabbitmq[49262]: get_monitor(): get_monitor function ready to return 8
Jan 25 09:25:00 [4645] d98-f2-b3-c9-6f-41 cib: info: cib_process_request: Forwarding cib_modify operation for section nodes to all (origin=local/crm_attribute/4)
Jan 25 09:25:00 [4645] d98-f2-b3-c9-6f-41 cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=d98-f2-b3-c9-6f-4
1/crm_attribute/4, version=0.312.0)
Note: the above issue, could be resolved by reading the log file by two inputs/parsers independently (possible, without multiline feature). However will look much better as component2
it's junk anyway.
@edsiper may I ask you again to review, whether would be possible on a place where time format is evaluated and fails to consider evaulate against alternative time format. I expect we can allow multiple records of Time_Format
in the parser config.
I thought that would already be the case?
I see there are two Time_Format
entries for the default syslog-rfc3164 parser already.
But according to my tests in #2967 only the first one is actually used and the second one gets silently ignored.
This functionality would be an easy way to solve my issue above.
Hi, any update for this? Having this feature would be useful in many cases. For example, we have a Jetty server that runs multiple WARs from other parties. Some support ISO8601 e.g:
2021-03-23 23:48:56,742 INFO Thread: qtp1497973285-14 - blah blah
And some others including Jetty itself have a slightly differnt timestamp (notice that there is a dot .
instead of comma ,
for milliseconds:
2021-03-23 23:48:04.989:INFO:oejs.Server:main: Started @31566ms
Are there any plans to introduce this feature?
My case is general. I would like to handle the problem of decimal places being omitted when microseconds are zero.
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
Time_Format %Y-%m-%dT%H:%M:%S%z
For example, Python's standard datetime function isoformat results in the following.
>>> import datetime
>>> datetime.datetime(2021,11,22,3,44,55,microsecond=6789,tzinfo=datetime.timezone.utc).isoformat()
'2021-11-22T03:44:55.006789+00:00'
>>> datetime.datetime(2021,11,22,3,44,55,microsecond=0,tzinfo=datetime.timezone.utc).isoformat()
'2021-11-22T03:44:55+00:00'
%Y-%m-%dT%H:%M:%S.%L%z
will throw an error in this particular case.
$ echo "time:2021-11-22T03:44:55.006789+00:00 method:GET status:200" >> access.log
$ echo "time:2021-11-22T03:44:55+00:00 method:GET status:200" >> access.log
[0] access.apache: [1637552695.006789000, {"method"=>"GET", "status"=>"200"}]
[2021/10/15 04:33:42] [error] [parser] cannot parse '2021-11-22T03:44:55+00:00'
[2021/10/15 04:33:42] [error] [parser:ltsv_iso8601_parser] Invalid time format %Y-%m-%dT%H:%M:%S.%L%z
[0] access.apache: [1634272422.520534812, {"log"=>"time:2021-11-22T03:44:55+00:00 method:GET status:200"}]
What I've been doing is applying multiple parser to my time
field, but that is causing flb to log a lot of warnings when the time format does not match the value of time
.
Is this bad practice? Should I instead be more verbose with my matching and only apply the correct parser to each match?
[FILTER]
name parser
alias unparsed_time_field
match *
key_name time
parser utc-date-time
parser iso8601-date-time
parser iso8601-date-time-offset
parser float-time
reserve_data on
Warning messages I get from flb
[2021/12/14 10:36:20] [ warn] [parser:iso8601-date-time] invalid time format %Y-%m-%dT%H:%M:%S for '1638665221.836'
This is not actually needed, as you can all do this:
# fluent-bit.conf
# other config goes here
[FILTER]
name parser
alias message-format-parser
match *
key_name log
reserve_data on
parser json
parser java
# and probably more parsers
[FILTER]
name parser
alias time-format-parser
match *
key_name time
preserve_key on
reserve_data on
parser time-iso8601
parser time-java
# other config goes here too
# parsers.conf
# Message format parsers
[PARSER]
Name json
Format json
Time_Keep On
Time_Key time
# no Time_Format key
[PARSER]
Name java
Format regex
# Probably change the time regexp to something less strict
Regex /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}\.\d+)\s+(?<level>[A-Z]+)\s+.*?\s+\[(?<thread>[^\s]+)\].*/
Time_Keep On
Time_Key time
# no Time_Format key
# Time format parsers
[PARSER]
Name time-iso8601
Format regex
Regex /^(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{1,2}:\d{1,2}:\d{1,2}\.\d+(Z|[\+\-]\d{2}(:\d{2})?))/
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
[PARSER]
Name time-java
Format regex
Regex /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}\.\d{,9})/
Time_Key time
Time_Format %Y-%m-%d %H:%M:%S.%L
This way you won't get any Error logs about the time format not being suitable, and you can add as many time formats you'd like. That is how I've done it, works well.
It would be worth support multiple. The reasoning is that on a complex system it might be quite complex to force all components on all systems to use one standardized format. Sure - it's the goal. But even if you assume configuration management - you will have to deal with multiple time format's - on metadata side in minimum.
Additionally, in practice it would be handy to be able to strip some part of the time value, as on example below, I basically can't encode the '+00:00' in strftime format options (IMHO). This can be managed by regexp, but usually on regexp you want only to find the field where time is, not to strip it partially.
common pattern of golang apps: (first two lines come from 3rd party libraries the app is using, last line is uber/ZAP log format application is using natively.