SigNoz / signoz

SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool
https://signoz.io
Other
18.51k stars 1.18k forks source link

Monitoring Otel collectors when horizontally scaled #881 #1778

Open ankitnayan opened 1 year ago

ankitnayan commented 1 year ago

The opentelemetry collector receiver starts dropping data if exporter is not able to write at the speed of ingestion. This does not create back pressure at otel-collector and load balancing is affected. When working in k8s, unless multiple client otel-collectors create separate connection to the SigNoz's otel-collector instances, the load is not evenly distributed. The service in k8s does not know if one otel-collector is receiving a lot of data and that it is not able to handle load as the otel-collector drops silently.

This was a discussion topic but no recording was found -> https://opentelemetry.io/community/end-user/discussion-group/

https://github.com/open-telemetry/opentelemetry-collector/issues/6564 https://github.com/open-telemetry/opentelemetry-collector/issues/5456

srikanthccv commented 1 year ago

@ankitnayan, what's the scope of this issue?

This does not create back pressure at otel-collector

This is not clear to me. What do you mean by it doesn't create back pressure? AFAIU dropping data is one of the strategies for back pressure.

When working in k8s, unless multiple client otel-collectors create separate connection to the SigNoz's otel-collector instances, the load is not evenly distributed.

Should it be the load balancer's responsibility to use techniques such as least connection or something similar to spread out the agent collectors load even before they establish a sticky connection with one single collector?