Why`sending_queue.queue_size` is set to 10 by default, which is easily to lead sending queue is full?

SumoLogic / sumologic-kubernetes-collection

Sumo Logic collection solution for Kubernetes

Apache License 2.0

147 stars 184 forks source link

exporters: otlphttp: endpoint: http://${LOGS_METADATA_SVC}.${NAMESPACE}.svc.{{ .Values.sumologic.clusterDNSDomain }}.:4318 sending_queue: queue_size: 10 # this improves load balancing at the cost of more network traffic disable_keep_alives: true

2023-12-15T13:24:28.112Z warn batchprocessor@v0.89.0/batch_processor.go:258 Sender failed {"kind": "processor", "name": "batch", "pipeline": "logs/containers", "error": "sending queue is full"}

This exporter is used to send data to another otelcol in the same cluster, so if the sending fails, it usually means that the problem lays in the metadata layer and the data could be lost anyway. One of such problems might be too much load: please make sure that you have either sumologic.autoscaling.enabled or metadata.logs.autoscaling.enabled set to true.

In general, the sending queue is used to not lose data in cases such as temporal failure on the backend side. In the metadata layer, this is set to a higher value: https://github.com/SumoLogic/sumologic-kubernetes-collection/blob/f37afb640af7a22a925812c9c1f3da6df2744350/deploy/helm/sumologic/conf/logs/otelcol/config.yaml#L9-L12

Here you can find more info on how to adjust the parameters for the sending queue. In particular, please take a look to queue_size. The num_seconds variable there is a value that you should think of "in case of a backend outage, how many seconds do I want to buffer data before starting to drop?".

To override the sending queue values for logs collector, use otellogs.config.merge option:

otellogs:
  config:
    merge:
      exporters:
        otlphttp:
          sending_queue:
            queue_size: <custom_size>

However, basing on the fact that we did not have problems with this particular option before, I'd also make sure that everything is fine in your cluster (the network works fine, the metadata layer pods don't crashloop etc.).

SumoLogic / sumologic-kubernetes-collection

Why`sending_queue.queue_size` is set to 10 by default, which is easily to lead sending queue is full? #3474