Closed MartinAltmayer closed 5 years ago
turning on fluentd log_level debug or trace should show additional log information
I would expect this to be the case for short lived pods unless watch [1] is enabled. When watch is disabled, there is no way for the plugin to get the metadata because there is nothing to return from the API server; there is nothing to be found. One could also expect this condition when possibly the cache is rolled over due to size and age, high message processing with many pods.
The metadata filter is applied like this:
<filter kubernetes.**>
@type kubernetes_metadata
merge_json_log true
preserve_json_log true
include_namespace_metadata false
</filter>
This would have watch
enabled by default. So, it must be the cache then?
I have the same problem, but in my case it happen only on short running pods and i have this error:
backend/fluentd-cloudwatch-hbhdv[fluentd-cloudwatch]: 2019-03-19 15:10:42 +0000 [warn]: #0 dump an error event: error_class=RuntimeError error="failed to expand
%Q[k8s.domain.com.br.\#{record[\"kubernetes\"][\"labels\"][\"release\"]}.\#{record[\"kubernetes\"][\"container_name\"]}]
: error = undefined method[]' for nil:NilClass" location="/fluentd/vendor/bundle/ruby/2.3.0/gems/fluentd-1.3.3/lib/fluent/plugin/filter_record_transformer.rb:310:in
rescue in expand'" tag="kubernetes.var.log.containers.monitoring-backup-duplicity-1553008200-5g6kb_monitoring_duplicity-cleanup-a7e1f1be4164672dee0f1ccd13431d67aab7dc32373d883b017c5537f44ad726.log" time=2019-03-19 15:10:11.548567705 +0000 record={"log"=>"Which can't be deleted because newer sets depend on them.\n", "stream"=>"stdout", "docker"=>{"container_id"=>"a7e1f1be4164672dee0f1ccd13431d67aab7dc32373d883b017c5537f44ad726"}, "kubernetes"=>{"container_name"=>"duplicity-cleanup", "namespace_name"=>"monitoring", "pod_name"=>"monitoring-backup-duplicity-1553008200-5g6kb", "pod_id"=>"a7e1f1be4164672dee0f1ccd13431d67aab7dc32373d883b017c5537f44ad726", "namespace_id"=>"78a930d9-20cb-11e9-a77f-023ff1239c2a"}}
I already added the "K8S_NODE_NAME" environment variable but the problem persist, if i remove this record all works fine:
kubernetes_app_name k8s.domain.com.br.${record["kubernetes"]["labels"]["release"]}.${record["kubernetes"]["container_name"]}
Regards
Someone?
In case of short-lived pods, log aggregators may sometimes not have chance to collect both meta-data and data either completely or partially if the pod happened to be started and disposed between two consecutive polling attempts by aggregator. When this happens, either entire logs may go missing or just labels. Introducing a small delay (sleep 65) at the end of the container CMD helped resolve the issue for my situation. I set the delay to be slightly higher than polling interval setting of log aggregator. For example, fluend has a default refresh_interval of 60 seconds.
See also https://github.com/DataDog/datadog-agent/issues/2749
Closing this issue as we are only able to retrieve labels if:
You may need to tweek your fluent configuration and setting specific to this plugin to ensure labels may be applied.
For some of my pods no Kubernetes labels are reported in the logs (record.kubernetes.labels does not exist) although the pod is labelled. It seems to happen mostly to short-running pods.
Unfortunately, my fluentd config processes logs based on labels (I am using kube-fluentd-operator), which fails if the labels are absent.
The test-pod I use to (sometimes) reproduce the issue is the following:
Is there any debug/logging output I can provide to help you understand this issue?
I've also seen https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/issues/123 but as I use v2.1.2 this should be no problem anymore.