SumoLogic / fluentd-kubernetes-sumologic

FluentD plugin to extract logs from Kubernetes clusters, enrich and ship to Sumo logic.
Apache License 2.0
61 stars 65 forks source link

Duplicate Logs because of Redundant SourceCategories #123

Open salilgupta1 opened 5 years ago

salilgupta1 commented 5 years ago

Metadata:

K8s Version: 1.10.11

Environment: AWS

Fluentd Image Version: 2.0.0

Config

---
apiVersion: v1
kind: Secret
metadata:
  name: sumologic
  namespace: kube-system
type: Opaque
data:
  collector-url: {{ SUMOLOGIC_COLLECTOR_URL }}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  namespace: kube-system
  name: fluentd-sumologic
  labels:
    app: fluentd
    version: v3
spec:
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      name: fluentd-sumologic
  template:
    metadata:
      labels:
        name: fluentd-sumologic
        version: v4
    spec:
      dnsConfig:
        options:
          - name: ndots
            value: "2"
      volumes:
      - name: pos-files
        hostPath:
          path: /var/run/fluentd-pos
          type: ""
      - name: host-logs
        hostPath:
          path: /var/log/
      - name: docker-logs
        hostPath:
          path: /var/lib/docker
      priorityClassName: civis-system-important
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
      containers:
      - image: sumologic/fluentd-kubernetes-sumologic:v2.0.0
        name: fluentd-sumologic
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            memory: "100Mi"
            cpu: "20m"
          limits:
            memory: "200Mi"
            cpu: "20m"
        volumeMounts:
        # logs are mounted into /var/log
        - name: host-logs
          mountPath: /var/log/
          readOnly: true
        - name: docker-logs
          mountPath: /var/lib/docker/
          readOnly: true
        - name: pos-files
          mountPath: /mnt/pos/
        env:
        - name: CONTAINER_LOGS_PATH
          value: "/var/log/containers/*.log"
        - name: EXCLUDE_PATH
          value: "[\"/var/log/containers/canal*\", \"/var/log/containers/user*\", \"/var/log/containers/kubernetes-dashboard*\", \"/var/log/containers/kube2iam*\", \"/var/log/containers/kube-apiserver*\"]"
        - name: K8S_METADATA_FILTER_WATCH
          value: "false"
        - name: COLLECTOR_URL
          valueFrom:
            secretKeyRef:
              name: sumologic
              key: collector-url

Problem: We are still seeing duplicated logs for control plane processes. #79 Addressed this issue and it did significantly reduce our ingestion but we are still seeing issues around what I think are duplicate source categories. Take this screenshot for example:

Screen Shot 2019-03-18 at 1 59 28 PM

Interesting thing to note: My coworker added the last exclusion path which excludes container logs for the api server and that seemed to have worked for de-duping logs from the api containers but it is unclear as to why that would have worked.

frankreno commented 5 years ago

@salilgupta1 : what is the corresponding sourceName for these sourceCateogiries? You can do something like _sourceCategory=kubernetes* | count by _sourceCategory, _sourceName