Closed kaiohenricunha closed 1 year ago
I'm also experiencing this issue! Any ideas? Thanks
Any ideas on what might be happening here?
Issue resolved by adding a custom dedot Fluentd ClusterFilter:
apiVersion: fluentd.fluent.io/v1alpha1
kind: ClusterFilter
metadata:
labels:
filter.fluentd.fluent.io/enabled: "true"
filter.fluentd.fluent.io/name: "de-dot"
name: de-dot
spec:
filters:
- customPlugin:
config: |
<filter **>
@type dedot
de_dot_separator _
de_dot_nested ${FLUENTD_DEDOT_NESTED:=true}
</filter>
---
apiVersion: fluentd.fluent.io/v1alpha1
kind: ClusterFluentdConfig
metadata:
labels:
config.fluentd.fluent.io/enabled: "true"
name: cluster-fluentd-config
spec:
clusterFilterSelector:
matchLabels:
filter.fluentd.fluent.io/enabled: "true"
filter.fluentd.fluent.io/name: "de-dot"
clusterOutputSelector:
matchLabels:
output.fluentd.fluent.io/enabled: "true"
output.fluentd.fluent.io/tenant: "core"
watchedNamespaces: [] # watches all namespaces when empty
---
This is because field names containing dots can create ambiguity in certain data structures. For example, logs with the label kubernetes.labels.statefulset.kubernetes.io/pod-name
will collide with the label kubernetes.labels.statefulset.kubernetes.io/pod-name.keyword
.
In this case, OpenSearch will think that the json of this log should have the following format and try to repeat it twice:
{
"kubernetes": {
"labels": {
"statefulset": {
"kubernetes": {
"io": {
"pod-name": "some_value"
}
}
}
}
}
}
This misinterpretation can cause unexpected behavior during indexing and querying, and might result in data loss or errors. By replacing dots with underscores using the de_dot filter, we can avoid such ambiguity and ensure that the field name is correctly interpreted.
After applying the dedot filter, it becomes:
{
"kubernetes": {
"labels": {
"statefulset_kubernetes_io/pod-name": "some_pod_name",
"statefulset_kubernetes_io/pod-name.keyword": "some_pod_name_keyword"
}
}
}
Dedot filter parameters would be a nice feature to have in the fluent-operator.
@stephweiss did you get it solved?
@stephweiss did you get it solved?
Yes thank you for asking. My config was wrong. It totally worked the way it is described here
Describe the issue
I deployed a fluentbit-fluentd solution using the fluent-operator's chart. Most of the logs are being properly indexed on OpenSearch and enriched with Kubernetes metadata.
But many logs are being rejected with the following warn:
What is worse is that these logs are not even discarded, fluentd keeps retrying to send them, which causes a flood of declined logs.
I set the
log_os_400_reason
totrue
on my ClusterOutput definition, and it gives me the exact reason for the error:The warn now prints:
So, the reason is a mapping collision it seems:
"[error type]: mapper_parsing_exception [reason]: 'object mapping for [kubernetes.labels.app] tried to parse field [app] as object, but found a concrete value"
My logs also have a
kubernetes.labels.app.kubernetes.io/foo
label that is conflicting. It has been discussed here.We do have the suggested
replaceDots
feature on fluentbit output CRD, but only if I'm sending logs from fluentbit directly to OpenSearch,I'm using
forward
as my ClusterOutput to send logs from Fluentbit to Fluentd, and then to OpenSearch, andforward
output doesn't have areplaceDots
parameter.I tried using dynamic index templates like it is suggested by the fluent-plugin-opensearch documentation to deal with this error: https://github.com/fluent/fluent-plugin-opensearch/blob/main/README.Troubleshooting.md#random-400---rejected-by-opensearch-is-occured-why .
It doesn't work either.
I then tried a different index template like this, mapping the troublesome label:
It solved the error for
kubernetes.labels.app
, but started throwing for other labels such as:What will be the equivalent replaceDots configuration for Fluentd ClusterOutput? Is there any other viable solution?
To Reproduce
kubernetes.annotations.enabled
andkubernetes.labels.enabled
set totrue
.Expected behavior
Logs being shipped to OpenSearch regardless of their labels. No OpenSearchErrorHandler - 400.
Your Environment
How did you install fluent operator?
Helm chart.
Additional context
No response