Open chrisbloe opened 3 years ago
Hi chrisbloe, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.
I might be just a bot, but I'm told my suggestions are normally quite good, as such: 1) If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster. 2) Please abide by the AKS repo Guidelines and Code of Conduct. 3) If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics? 4) Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS. 5) Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue. 6) If you have a question, do take a look at our AKS FAQ. We place the most common ones there!
Triage required from @Azure/aks-pm
Action required from @Azure/aks-pm
I can only reemphasize what has been said already. The ability to increase the time for retaining events is indispensable for us. Anyone who has ever done a post-mortem analysis on AKS will probably agree.
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
@justindavies is this on the road map (linked issues above)
Hi, Our company is looking into sth similar. Any updates @justindavies? Shuaib
Action required from @Azure/aks-pm
Issue needing attention of @Azure/aks-leads
We often find Kubernetes events add up too much pressure on etcd and is one of the common culprits for overloaded and slow control planes. We would have to be very careful about when and how we allow customers to configure these kinds of settings from a supportability and reliability point of view.
We are however considering creating resource logs stream for these events. You could then use Log Analytics workspace to look up past events and do your analysis. Would that solve your events ttl problem? @617m4rc @chrisbloe
As @seguler mentioned above, we are in the process of ingesting and storing kube-system and node based k8s Events and exposing them via diagnostics logs in the Azure portal.
@617m4rc @chrisbloe, if your clusters have Azure Monitor for Containers enabled, you will be able to see Kubernetes Events now in the Azure portal from your managed cluster overview page, under the: "Logs / Diagnostics / Kubernetes Events" menu.
Each of the above will require a Log Analytics workspace to be created.
@seguler update here pls?
Does anyone know if you can configure the event ttl yet?
+1 facing this one.
K8s offers the configuration of a variety of settings via kube-apiserver - https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
I understand that AKS offers a managed service, so doesn't expose the option of a user setting these values directly.
I have found a couple of other tickets that relate to these options...
1150
1993
...but the underlying request of a standard approach to set each of these config values remains unanswered.
The one I'm particularly interested in right now is
--event-ttl
, which cannot be changed from its 1h default, so I think there needs to be some thought on a strategic approach for allowing these values to be modified, and how this should be documented.This ticket is for the wider consideration, not for
--event-ttl
specifically.