fluxcd / notification-controller

The GitOps Toolkit event forwarder and notification dispatcher
https://fluxcd.io
Apache License 2.0
150 stars 131 forks source link

Metrics has a potential cardinality issue #828

Open Kuzbekov opened 4 months ago

Kuzbekov commented 4 months ago

By nature, our webhooks are usually open for external connections, especially when the code is on cloud-based repositories. If somebody tries to scan the hostname for webhook directed to the webhook receiver for vulnerabilities and attempts to gain access to multiple paths, it will generate a lot of metrics records and lead to cardinality explosion on the Prometheus server because the label "handle" stores the path from each new request due to dynamic handler naming. It can be mitigated by making ingress with static path, but it's not mentioned in documentation. Relevant information is here

souleb commented 4 months ago

We would need a new flag for the controller and changes in https://github.com/fluxcd/notification-controller/blob/580497beeb8bee4cee99163bb63fba679cd2d735/internal/server/event_server.go#L86 and https://github.com/fluxcd/notification-controller/blob/580497beeb8bee4cee99163bb63fba679cd2d735/internal/server/receiver_server.go#L53

Are you willing to contribute the change?

Kuzbekov commented 4 months ago

I would like to contribute. Do you think it would be better to implement a flag with a default value that ensures safety? This change would necessitate notifying users due to its potential impact. Currently, someone could theoretically use precise webhook values in the 'handle' label within alerts or metrics. Or we should keep current behavior?

souleb commented 4 months ago

we have to keep the default behavior. Will should add recommendation in the documentation.