SumoLogic / fluentd-kubernetes-sumologic

FluentD plugin to extract logs from Kubernetes clusters, enrich and ship to Sumo logic.
Apache License 2.0
61 stars 65 forks source link

Sourcecategories in Sumo appending numbers #115

Open jeffwroblewski opened 5 years ago

jeffwroblewski commented 5 years ago

For 2.1 and beyond, user is seeing an issue where sourecategories are populating like this:

app_prod/some_app_name app_prod/some_app_name_62 app_prod/some_app_name 63...

Can provide more info offline as needed.

Thanks! Jeff W. TAM, Sumo

frankreno commented 5 years ago

cc @bendrucker : ill look into this as soon as I can, but seems likely related to the fix for #78

bendrucker commented 5 years ago

What version of Kubernetes are you running?

andrews32 commented 5 years ago

We're running OpenShift v3.3 which includes Kubernetes v1.3.

bendrucker commented 5 years ago

Gotcha, seems like there's probably no test coverage for that pod name format anymore. I can look into it in a few.

andrews32 commented 5 years ago

We are in the middle of upgrading to OpenShift v3.9 which includes Kubernetes v1.9, but the symbolic links where I believe the code retrieves the pod name from is the same format as in K8S v1.3.

For example, the first one is docker-registry-2-mqe0f... Where docker-registry is the pod_name, 2 is the deployment config counter, and mqe0f is the hash.

The problem is its inconsistent on what is retrieves as the _sourceCategory. Sometimes it'll be "docker-registry", sometimes it'll be "docker-registry-2".

[root@infra01-devtest-vxbyr ~]# ls /var/log/containers/ docker-registry-2-mqe0f_default_POD-9171d6915e911a532fb6048191e9713ed36a14ccd1a9057624ece298f08b350a.log docker-registry-2-mqe0f_default_registry-6c077b90f9592770d1e63b0444551d331167551e035f3d25b5922d0b4ec05325.log hawkular-cassandra-1-bd5m3_openshift-infra_hawkular-cassandra-1-86f7ee9fddbff12f935a88b0b54e7b46e82a73657000ff41b209076ed7fcc657.log hawkular-cassandra-1-bd5m3_openshift-infra_POD-a6d27b85d71c4e0c56e600b8d3666e39da8d360515c75d20282318b56f50be47.log hawkular-metrics-5ovsm_openshift-infra_hawkular-metrics-e70c33b6df41717ad12ccfc1d55b462603ac6a79adbae45c7cad0d363bfccd74.log hawkular-metrics-5ovsm_openshift-infra_POD-02596538d07c076a7c2447bd364981296ed2584aea29e1158eca643d78359953.log registry-console-1-6iu26_default_POD-7ce336ef1bb0442e3d93c642c7b63c523e23adf924e4ed1f4f26bd7db6e17c64.log registry-console-1-6iu26_default_registry-console-a3b0ad6a98b33001ef205bd1bb83d027d19d67013f701d66ec05183314a04e3c.log router-25-2gpi9_default_POD-ec02bb74c544cc201fea714ce52b3f7bef9adb7d6c47b03e7b63df4cb8df6819.log router-25-2gpi9_default_router-6cea4aef80d33705c275d19392de0608accec18243e7ef2a7a773103735ee510.log

bendrucker commented 5 years ago

So the actual pod name is docker-registry-2-mqe0f if I'm reading right? Would be a huge help if you could get an entire pod (kubectl get pod <name> -o yaml) for confirmation.

bendrucker commented 5 years ago

I suspect the regression that's affecting you from #78 has to do with the pod template hash. Rather than hardcode error-prone patterns based on string formatting (i.e. strip this part if it's numbers), we switched to actually detecting the pod template hash and deterministically stripping the dynamic parts. I'm trying to get a 1.3 cluster up on minikube but in case that doesn't seem viable so a full pod from your cluster would be helpful.

andrews32 commented 5 years ago

`[svc-vxby-ose@master01-devtest-vxbyr ~]$ oc get po docker-registry-3-9n3bx -o yaml apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/created-by: | {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"docker-registry-3","uid":"3bf3e47f-fad7-11e8-8df7-005056848c95","apiVersion":"v1","resourceVersion":"1205030190"}} openshift.io/deployment-config.latest-version: "3" openshift.io/deployment-config.name: docker-registry openshift.io/deployment.name: docker-registry-3 openshift.io/scc: restricted creationTimestamp: 2018-12-08T10:52:07Z generateName: docker-registry-3- labels: deployment: docker-registry-3 deploymentconfig: docker-registry docker-registry: default name: docker-registry-3-9n3bx namespace: default resourceVersion: "1205031022" selfLink: /api/v1/namespaces/default/pods/docker-registry-3-9n3bx uid: 4e5362ec-fad7-11e8-8df7-005056848c95 spec: containers:

bendrucker commented 5 years ago

From what you posted, the deployment name is docker-registry-3. This repo was meant to remove random sections included from Deployments/ReplicaSets, not necessarily any numeric ID. Seems like it was a bug that it matched/deleted part of your deployment name from the pod_name. You could consider using the open shift labels directly for your source categories.

andrews32 commented 5 years ago

I'm guessing deployment name wasn't always where the _sourceCategory got his values from. This is new behavior.

Also, as Frank mentiond above, #78 was only fixed/closed in December 2018 which matches the first reports of this new behavior.

What changed in #78 and why? How do we undo it without manually using an old version that will not be maintained?

bendrucker commented 5 years ago

I'm guessing deployment name wasn't always where the _sourceCategory got his values from.

I don't see any reason to assume that

78 was closed by #100. #78 identified bugs in the original naive implementation of replica pod sanitization. The original implementation would remove the second to last segment of the pod name if it were a number. This is unnecessarily naive.

This numeric value was the pod template hash which is included as a label on the pods. In later versions of k8s, that numeric value was mapped to an alphanumeric encoding, breaking the naive name sanitization. #100 takes the template hash, looks for the numeric or alphanumeric version in the pod name, and removes that segment by exact match. Anything else you do to your pods, including numbers, is left behind.

This feature was meant to target Kubernetes ReplicaSets and this plugin was stripping bits of your pod name due to a bug. It sucks, but sometimes bug fixes are breaking changes if you were depending on buggy behavior.

I made some suggestions above on how to provide a specific metadata template with labels—that would let you define conventions that match your stack. I don't think it would be a good idea to re-introduce behavior that parses pod name conventions outside of what's present in k8s core.

andrews32 commented 5 years ago

I was looking for something else and came across this ticket and noticed it was still open. I think we can close this now. Thanks for explaining it @bendrucker.