airwallex / k8s-pod-restart-info-collector

Automated troubleshooting of Kubernetes Pods issues. Collect K8s pod restart reasons, logs, and events automatically.
325 stars 47 forks source link

Sending a PR with a new feature. #9

Closed npsoni88 closed 1 year ago

npsoni88 commented 1 year ago

Hey folks,

I've modified the code a bit for our use case (fairly large gaming company). I was wondering how should I proceed with having it sent back to you so everything is in sync.

Motivation - Some companies would prefer sending slack alerts for specific applications. For example, I may only be interested in the failing pods that are critical applications for which we are sending "on-call alerts". Everything else, can be ignored. We have no option to do that right now.

What's done? In the helm values.yaml, users can now supply labels that they would want to be monitored. A new function "NewControllerWithLabels" will do everything as "NewController", except, it will only send a message to slack if the pod (that's restarting) has that label key on it.

This will bypass "ignoredNamespace" "ignoredPod" functions and will only rely on the label key that's supplied in the values.yaml.

This way, users can

  1. Either use the helm chart as it is built right now with additional option to ignore namespaces / pods
  2. OR simply supply label keys that they want to alert on (which will take away the ignored namespaces / pod features).

I am still testing it out in our environment. I am not sure how to proceed, if I should send the code back as a PR and if I can review it with somebody.

able8 commented 1 year ago

Hi @npsoni88 , you can try to add the following Pod annotation or label to the critical pods. Then it will send the alerts to your "on-call alerts" slack channel. You may not need to create a "NewController" for this.

https://github.com/airwallex/k8s-pod-restart-info-collector#faq

How to customize slack channel for each pods Adding alert-slack-channel: "your-slack-channel-name" to Pod annotations or labels. For example, a label: alert-slack-channel: "restart-info-nonprod"