Open byrnedo opened 4 years ago
you can add the env variable FLUENT_CONTAINER_TAIL_PARSER_TYPE
with value /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
- it'll make this daemonset ok with the containerd logs
The above regex worked for me! thanks!
Could we make it work for both containerd and docker without setting the type?
Hi,
It may look like it works, though having dealt with OpenShift a lot lately: you're missing something. Eventually, you'll see log messages being split into several records.
I've had to patch the /fluentd/etc/kubernetes.conf
file.
We could indeed set FLUENT_CONTAINER_TAIL_PARSER_TYPE
to /^(?<time>.+) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.+)$/
.
However we also need to add the following:
<filter kubernetes.**>
@type concat
key log
partial_key logtag
partial_value P
separator ""
</filter>
Note that I'm setting a logtag
field, from the F
and P
values, that @arthurdarcet drops with a [^ ]*
.
We actually need those re-constructing multi-line message (P
means you have a partial log, while F
notes the last part of a message).
I have enabled rancher logging with fluentd for containerD , but still getting issue.Below our the enn variable i have pasted in daemon set https://rancher.com/docs/rancher/v2.x/en/cluster-admin/tools/logging/
env:
output:
2020-08-11 16:07:29 +0000 [warn]: #0 pattern not matched: "2020-08-11T18:07:28.606198265+02:00 stdout F 2020-08-11 16:07:28 +0000 [warn]: #0 pattern not matched: \"2020-08-11T18:07:27.620512318+02:00 stdout F 2020-08-11 16:07:27 +0000 [warn]: #0 pattern not matched: \\\"2020-08-11T18:07:26.541424158+02:00 stdout F 2020-08-11 16:07:26 +0000 [warn]: #0 pattern not matched: \\\\\\\"2020-08-11T18:07:25.531461018+02:00 stdout F 2020-08-11 16:07:25 +0000 [warn]: #0 pattern not matched: \\\\\\\\\\\\\\\"2020-08-11T18:07:24.528268248+02:00 stdout F 2020-08-11 16:07:24 +0000 [warn]: #0 pattern not matched: \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"2020-08-11T18:07:23.524149263+02:00 stdout F 2020-08-11 16:07:23 +0000 [warn]: #0 pattern not matched: \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"2020-08-11T18:07:23.187045754+02:00 stdout F 2020-08-11 16:07:23.186 [INFO][57] int_dataplane.go 976: Finished applying updates to dataplane. msecToApply=1.434144\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\"\\\\\\\"\\\"\""
@arthurdarcet @faust64 How the regex string supposed to work in FLUENT_CONTAINER_TAIL_PARSER_TYPE
if that variable translated to @type value in the configuration of parser?
kubernetes.conf inside container contains:
…
<source>
@type tail
…
<parse>
@type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
…
Allowed @type
s https://docs.fluentd.org/configuration/parse-section#type
Maybe this should just be addressed with a flag. The issue has been present for such a long time and impacts other vendors that choose to spin value-added products around this. Word to the wise, docker is not the only front-end to containers and container evolution continues. Addressing this now external to the sloppy work arounds with regex or manipulation would be a good thing. Better to get in front of the issue than lag behind.
DB
We can put additional plugin into plugins directory, e.g. https://github.com/fluent/fluentd-kubernetes-daemonset/tree/master/docker-image/v1.11/debian-elasticsearch7/plugins
So if anyone provides containerd log format parser, we can configure it via FLUENT_CONTAINER_TAIL_PARSER_TYPE
.
That would work as I am willing to write a custom parser for this to contribute and save others the same issues. Or re-phrased, that is perhaps the best option, an additional plugin for this specific use case
@arren-ru : you are right, my mistake. FLUENTD_CONTAINER_TAIL_PARSER_TYPE
should be set to regexp
, and then you'ld set anexpression
, with your actual regexp.
Either way, that's not something you can currently configure only using environment variables. You're looking for something like this: https://github.com/faust64/kube-magic/blob/master/custom/roles/logging/templates/fluentd.j2#L31-L51
@faust64 I solved this by overriding kubernetes.conf with configmap mounted in place of original configuration with modified content, this gives basic working solution
<source>
@type tail
@id in_tail_container_logs
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}"
exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}"
read_from_head true
<parse>
@type regexp
expression /^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<flags>[^ ]+) (?<message>.*)$/
time_format %Y-%m-%dT%H:%M:%S.%N%:z
</parse>
</source>
<filter kubernetes.**>
@type kubernetes_metadata
@id filter_kube_metadata
kubernetes_url "#{'https://' + ENV.fetch('KUBERNETES_SERVICE_HOST') + ':' + ENV.fetch('KUBERNETES_SERVICE_PORT') + '/api'}"
</filter>
The kind of solutions presented here will cause json log to be parsed as a string, and no fields defined in json itself will be recongnized as Elasticsearch fields correct ?
The kind of solutions presented here will cause json log to be parsed as a string, and no fields defined in json itself will be recongnized as Elasticsearch fields correct ?
Not sure understood you, but CRI logs are represented as a string line, this is not a docker logs, so if you want to parse json further you may want to add pipelined parser or filter
I got an issue, that my logfile was filled with backslashes. I am using containerd instead of docker. I solved it by putting in the following configuration:
- name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
value: /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
I got an issue, that my logfile was filled with backslashes. I am using containerd instead of docker. I solved it by putting in the following configuration:
- name: FLUENT_CONTAINER_TAIL_PARSER_TYPE value: /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
Did not work for me on 1.20.1 hosted on VMs. Still the same backslashes full of error
I am using containerd as the CRI for kubernetes and used FLUENT_CONTAINER_TAIL_PARSER_TYPE env var. But it seems that now the logs are somewhat readable but the time format is incorrect so the error is shown for that.
Any solution to this problem or can we change the time format by any env var?
Ok, got it on how to fix this one.
First we know that we need to change the logging format as containerd do not use json format and is a regular text format. so we add the below environment variable to the daemonset.
- name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
value: /^(?<time>.+) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.*)$/
Now when we do this, it still shows error with the time format. To solve this we will extract the kubernetes.conf file from a running fluentd container and copy the contents to a config map and mount that value to the kubernetes.conf location i.e. /fluentd/etc/kubernetes.conf.
volumeMounts:
- name: fluentd-config
mountPath: /fluentd/etc/kubernetes.conf
subPath: kubernetes.conf
volumes:
- name: fluentd-config
configMap:
name: fluentd-config
items:
- key: kubernetes.conf
path: kubernetes.conf
So to fix the error, we update the following value inside the source.
<source>
@type tail
@id in_tail_container_logs
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}"
exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}"
read_from_head true
<parse>
@type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
time_format %Y-%m-%dT%H:%M:%S.%N%:z
</parse>
</source>
time_format %Y-%m-%dT%H:%M:%S.%NZ
to
time_format %Y-%m-%dT%H:%M:%S.%N%:z
Now deploy the daemonset, it will work.
I'd published the new parser included images: https://github.com/fluent/fluentd-kubernetes-daemonset/pull/521, https://github.com/fluent/fluentd-kubernetes-daemonset/commit/2736b688e405c9da4675eb4e23ade916d009ec53
With FLUENT_CONTAINER_TAIL_PARSER_TYPE
, we can specify cri
type parser for parsing CRI format Logs.
ref: https://github.com/fluent/fluentd-kubernetes-daemonset#use-cri-parser-for-containerdcri-o-logs
we are facing this issue with slashes "\\"
, we use v1.12-debian-elasticsearch7-1 version of daemonset , we are currently testing the workarounds mentioned in this issue ,
would like to know if there would be a newer version of the daemonset after fixing the issue or do we need to use the workarounds permanently,
thanks,
With BDRK-3386 is this issue fixed?
From what I can see, there's still no way to concatenate partial logs coming from containerd or cri-o. Nor to pass a regular expression, when FLUENT_CONTAINER_TAIL_PARSER_TYPE
would be set to regexp
Containerd and cri-o requires something such as, reconstructing logs split into multiple lines (partials):
<filter kubernetes.**>
@type concat
key log
partial_key logtag
partial_value P
separator ""
</filter>
The filter above relies on some logtag
, defined as following:
<parse>
@type regexp
expression /^(?<logtime>.+) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.*)$/i
time_key logtime
time_format %Y-%m-%dT%H:%M:%S.%N%Z
</parse>
I'm not sure how to make those filter block and regexpr conditional. Nor that we can come up with a configuration that would suit both containerd/cri-o and Docker. You would also need to change some inputs sources (systemd units, docker log file). I've given up and written my own ConfigMap, based on the configuration shipping in this image, fixing the few bits I need.
you can add the env variable
FLUENT_CONTAINER_TAIL_PARSER_TYPE
with value/^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
- it'll make this daemonset ok with the containerd logs
I was stuck this question all day until i see you answer! Love this answer and the author !!!!!!!!!!!!!!
As per discussion and this change, make sure to turn off greedy parsing for the timestamp. e.g.
^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$
^(?<time>.+?) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$
With greedy parsing, there's a chance of runaway logging (log errors caused by scraping log errors). Context:
https://github.com/fluent/fluent-bit/pull/5078 https://github.com/fluent/fluent-bit/commit/cf239c2194551bb31ebf42ae075eb847fc326ec6
Ok, got it on how to fix this one.
First we know that we need to change the logging format as containerd do not use json format and is a regular text format. so we add the below environment variable to the daemonset.
- name: FLUENT_CONTAINER_TAIL_PARSER_TYPE value: /^(?<time>.+) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.*)$/
Now when we do this, it still shows error with the time format. To solve this we will extract the kubernetes.conf file from a running fluentd container and copy the contents to a config map and mount that value to the kubernetes.conf location i.e. /fluentd/etc/kubernetes.conf.
volumeMounts: - name: fluentd-config mountPath: /fluentd/etc/kubernetes.conf subPath: kubernetes.conf volumes: - name: fluentd-config configMap: name: fluentd-config items: - key: kubernetes.conf path: kubernetes.conf
So to fix the error, we update the following value inside the source.
<source> @type tail @id in_tail_container_logs path /var/log/containers/*.log pos_file /var/log/fluentd-containers.log.pos tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}" exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}" read_from_head true <parse> @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}" time_format %Y-%m-%dT%H:%M:%S.%N%:z </parse> </source>
time_format %Y-%m-%dT%H:%M:%S.%NZ
totime_format %Y-%m-%dT%H:%M:%S.%N%:z
Now deploy the daemonset, it will work.
cool. its work well
Hi, i have received your email. Thanks!This is a auto-reply email.
Ok, got it on how to fix this one.
First we know that we need to change the logging format as containerd do not use json format and is a regular text format. so we add the below environment variable to the daemonset.
- name: FLUENT_CONTAINER_TAIL_PARSER_TYPE value: /^(?<time>.+) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.*)$/
Now when we do this, it still shows error with the time format. To solve this we will extract the kubernetes.conf file from a running fluentd container and copy the contents to a config map and mount that value to the kubernetes.conf location i.e. /fluentd/etc/kubernetes.conf.
volumeMounts: - name: fluentd-config mountPath: /fluentd/etc/kubernetes.conf subPath: kubernetes.conf volumes: - name: fluentd-config configMap: name: fluentd-config items: - key: kubernetes.conf path: kubernetes.conf
So to fix the error, we update the following value inside the source.
<source> @type tail @id in_tail_container_logs path /var/log/containers/*.log pos_file /var/log/fluentd-containers.log.pos tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}" exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}" read_from_head true <parse> @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}" time_format %Y-%m-%dT%H:%M:%S.%N%:z </parse> </source>
time_format %Y-%m-%dT%H:%M:%S.%NZ
totime_format %Y-%m-%dT%H:%M:%S.%N%:z
Now deploy the daemonset, it will work.
it works for me ! you are so gorgeous @vipinjn24
I have seperated file, outside kubernetes.conf named tail_container_parse.conf
, inside:
<parse>
@type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
time_format "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TIME_FORMAT'] || '%Y-%m-%dT%H:%M:%S.%NZ'}"
</parse>
Just use env FLUENT_CONTAINER_TAIL_PARSER_TIME_FORMAT
in daemonset with above time_format fixed for me problem.
Hi, i have received your email. Thanks!This is a auto-reply email.
I have seperated file, outside kubernetes.conf named
tail_container_parse.conf
, inside:<parse> @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}" time_format "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TIME_FORMAT'] || '%Y-%m-%dT%H:%M:%S.%NZ'}" </parse>
Just use env
FLUENT_CONTAINER_TAIL_PARSER_TIME_FORMAT
in daemonset with above time_format fixed for me problem.
Hmm let me see this one.
After reading the whole thread, and experimenting with different settings posted here I managed to set up fluentd working with OKD4.
- name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
value: /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
- name: FLUENT_CONTAINER_TAIL_PARSER_TIME_FORMAT
value: '%Y-%m-%dT%H:%M:%S.%N%:z'
I set these two env vars and it works without overwriting any config files in the container.
For the record, as it's now the 7th answer suggesting this ... At some point, I gave the following sample: https://github.com/fluent/fluentd-kubernetes-daemonset/issues/412#issuecomment-912985090 This is still valid.
With something like /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
, your [^ ]*
drops a character, that may be a P (partial) or an F (final).
If you do not concat partial lines up until the next final one, you will, eventually, have some logs broken down into several records. At which point: good luck finding anything in Kibana/elasticsearch.
Hi
I dont know why the logs is not trying parsed as json first
currently all my logs to elastic for example is under "log" field
this is my containers.input.conf
configuration:
<source>
@id fluentd-containers.log
@type tail
path /var/log/containers/*.log
pos_file /var/log/containers.log.pos
tag raw.kubernetes.*
read_from_head true
<parse>
@type multi_format
<pattern>
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
</pattern>
<pattern>
format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
time_format %Y-%m-%dT%H:%M:%S.%N%:z
</pattern>
</parse>
</source>
# Detect exceptions in the log output and forward them as one log entry.
<match raw.kubernetes.**>
@id raw.kubernetes
@type detect_exceptions
remove_tag_prefix raw
message log
stream stream
multiline_flush_interval 5
max_bytes 500000
max_lines 1000
</match>
# Concatenate multi-line logs
<filter **>
@id filter_concat
@type concat
key log
use_first_timestamp true
multiline_end_regexp /\n$/
separator ""
timeout_label @NORMAL
flush_interval 5
</filter>
# Enriches records with Kubernetes metadata
<filter kubernetes.**>
@id filter_kubernetes_metadata
@type kubernetes_metadata
skip_labels true
</filter>
# Fixes json fields in Elasticsearch
<filter kubernetes.**>
@id filter_parser
@type parser
key_name log
reserve_time true
reserve_data true
remove_key_name_field true
<parse>
@type multi_format
<pattern>
format json
</pattern>
<pattern>
format none
</pattern>
</parse>
</filter>
Hi, i have received your email. Thanks!This is a auto-reply email.
Hi
I dont know why the logs is not trying parsed as json first currently all my logs to elastic for example is under "log" field this is my
containers.input.conf
configuration:<source> @id fluentd-containers.log @type tail path /var/log/containers/*.log pos_file /var/log/containers.log.pos tag raw.kubernetes.* read_from_head true <parse> @type multi_format <pattern> format json time_key time time_format %Y-%m-%dT%H:%M:%S.%NZ </pattern> <pattern> format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/ time_format %Y-%m-%dT%H:%M:%S.%N%:z </pattern> </parse> </source> # Detect exceptions in the log output and forward them as one log entry. <match raw.kubernetes.**> @id raw.kubernetes @type detect_exceptions remove_tag_prefix raw message log stream stream multiline_flush_interval 5 max_bytes 500000 max_lines 1000 </match> # Concatenate multi-line logs <filter **> @id filter_concat @type concat key log use_first_timestamp true multiline_end_regexp /\n$/ separator "" timeout_label @NORMAL flush_interval 5 </filter> # Enriches records with Kubernetes metadata <filter kubernetes.**> @id filter_kubernetes_metadata @type kubernetes_metadata skip_labels true </filter> # Fixes json fields in Elasticsearch <filter kubernetes.**> @id filter_parser @type parser key_name log reserve_time true reserve_data true remove_key_name_field true <parse> @type multi_format <pattern> format json </pattern> <pattern> format none </pattern> </parse> </filter>
Nevermind, successed with this config:
<source>
@id fluentd-containers.log
@type tail
path /var/log/containers/*.log
pos_file /var/log/containers.log.pos
tag raw.kubernetes.*
#read_from_head true
<parse>
@type multi_format
<pattern>
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
</pattern>
<pattern>
format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
time_format %Y-%m-%dT%H:%M:%S.%N%:z
</pattern>
</parse>
</source>
# Detect exceptions in the log output and forward them as one log entry.
<match raw.kubernetes.**>
@id raw.kubernetes
@type detect_exceptions
remove_tag_prefix raw
message log
stream stream
multiline_flush_interval 5
max_bytes 500000
max_lines 1000
</match>
## Concatenate multi-line logs
#<filter **>
# @id filter_concat
# @type concat
# key log
# use_first_timestamp true
# multiline_end_regexp /\n$/
# separator ""
# timeout_label @NORMAL
# flush_interval 5
#</filter>
# Enriches records with Kubernetes metadata
<filter kubernetes.**>
@id filter_kubernetes_metadata
@type kubernetes_metadata
skip_labels true
</filter>
# Fixes json fields in Elasticsearch
<filter kubernetes.**>
@id filter_parser
@type parser
key_name log
reserve_time true
reserve_data true
remove_key_name_field true
<parse>
@type multi_format
<pattern>
format json
</pattern>
<pattern>
format none
</pattern>
</parse>
</filter>
We are getting issue in cri parser of fluentbit after EKS Upgrade to 1.24
With below parser , log: prefix is missing when it forward logs to splunk
[PARSER]
Name cri
Format regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
Hi, I'm running k3s using containerd instead of docker. The log format is different to docker's. AFAIK it would just involve changing the @type json to a regex for the container logs, see https://github.com/rancher/k3s/issues/356#issuecomment-485080611 Would anyone be up for doing this? With maybe some kind of env var to switch on the container d support, eg
CONTAINER_RUNTIME=docker
as default, withcontainerd
as an alternative