SumoLogic / sumologic-kubernetes-collection

Sumo Logic collection solution for Kubernetes
Apache License 2.0
147 stars 183 forks source link

Scaling events pods? #2380

Open knguyenst opened 2 years ago

knguyenst commented 2 years ago

There isn't an option to set replicaCount or autoscaling for fluentd-events pods via helm chart.

Is this by design?

swiatekm commented 2 years ago

Yes, this is by design. Collecting events is essentially reading records from a database table - if you want to scale that, you need to shard the table somehow, which is possible, but involves significant additional complexity.

Do you need to scale the event collection? Even very large clusters, a single FluentD Pod should be able to cope with the load, and if not, the equivalent we're introducing using the Opentelemetry Collector(https://github.com/SumoLogic/sumologic-kubernetes-collection/pull/2379) should do the trick in the near future.

knguyenst commented 2 years ago

The issue is there is only one pod running and it's hard to monitor when our cluster is running on spot instances which a node where event pods can go down anytime. When it's down, the event pod does take time to get deployed on another node and the timing is unpredictable thus trigger a non-actionable alert.

andrzej-stencel commented 7 months ago

How long does it take for a new events collector pod to be created on another node after previous node went down? Perhaps adjusting your alerting is enough?

We are reluctant to add a high-availability feature to the events collector (currently it is Otelcol, not Fluentd), as it increases complexity.