Open dimakyriakov opened 1 year ago
Hi, can you share some example error messages to help understand what type of events from helm-controller are causing this? Notification-controller already has rate limiting to prevent duplicate events for a period of 5 minutes by default. After 5 minutes, you'll receive an alert if the same event is received again. I think that's what's happening in this case. It may be an issue in the helm-controller which is sending such error events, which may need attention. Maybe some change in helm-controller or HelmRelease would help suppress or fix the errors. If these errors aren't actionable, you can ignore them in notification-controller Alerts by defining an ExclusionList, see https://fluxcd.io/flux/components/notification/alert/#specification .
some of our helmreleases has error "reconciliation failed: install retries exhausted" and mostly we are ok with it it would be nice to only get this error once when it appears
@dimakyriakov by design, error alerts are sent every 5 minutes until they are resolved. You can increase the interval with --rate-limit-interval
, flags docs here https://fluxcd.io/flux/components/notification/options/
thank you for response Guys, i will close the ticket
@stefanprodan, hey, i just want to ask where exactly I can set --rate-limit-interval
option?
I created provider and alert in yaml file. For me it looks like it's options for cli.
That’s a controller flag, see here how to change them https://fluxcd.io/flux/cheatsheets/bootstrap/
This is my kustomization.yaml file. You mean I have to increase --rate-limit-interval
for name: notification-controller?
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- gotk-components.yaml
- gotk-sync.yaml
patchesStrategicMerge: # these are tuned for demonstration and debugging
- |-
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: flux-system
namespace: flux-system
spec:
patches:
- target:
version: v1
group: apps
kind: Deployment
name: notification-controller
namespace: flux-system
patch: |-
- op: add
path: /spec/template/spec/containers/0/args/-
value: --rate-limit-interval=10s # do not discard messages that are sent again after 10s+
- target:
version: v1
group: apps
kind: Deployment
name: kustomize-controller
namespace: flux-system
patch: |-
- op: add
path: /spec/template/spec/containers/0/args/0
value: --concurrent=5 # increase the number of Kustomizations processed at once
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "2" # allow KC access to more CPU
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: "2Gi" # allow KC access to more memory
- target:
version: v1
group: apps
kind: Deployment
name: source-controller
namespace: flux-system
patch: |-
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "2" # allow KC access to more CPU
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: "2Gi" # allow KC access to more memory
- target:
version: v1
group: apps
kind: Deployment
name: helm-controller
namespace: flux-system
patch: |-
- op: add
path: /spec/template/spec/containers/0/args/0
value: --concurrent=12 # increase the number of HelmReleases processed at once
- op: replace
path: /spec/template/spec/containers/0/resources/limits/cpu
value: "2" # allow KC access to more CPU
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: "2Gi" # allow KC access to more memory
--rate-limit-interval=10s # do not discard messages that are sent again after 10s+
No wander you get alert spam, the default is 5m
, you can increase it to a value to fits for you.
Problem: I created an alert to monitor all helmreleases in a specific namespace and it's making huge traffic of errors in a slack channel. It duplicates errors every certain period and alerting despite no changes to helmreleases.
Here is alert file:
Question: Is it possible to trigger an alert only if we made changes to helmrelease, not the status of an existing one? Is it possible to not duplicate alert message that we already received after certain period of the time?