Closed txjjjjj closed 9 months ago
This exporter is used to send data to another otelcol in the same cluster, so if the sending fails, it usually means that the problem lays in the metadata layer and the data could be lost anyway. One of such problems might be too much load: please make sure that you have either sumologic.autoscaling.enabled
or metadata.logs.autoscaling.enabled
set to true
.
In general, the sending queue is used to not lose data in cases such as temporal failure on the backend side. In the metadata layer, this is set to a higher value: https://github.com/SumoLogic/sumologic-kubernetes-collection/blob/f37afb640af7a22a925812c9c1f3da6df2744350/deploy/helm/sumologic/conf/logs/otelcol/config.yaml#L9-L12
Here you can find more info on how to adjust the parameters for the sending queue. In particular, please take a look to queue_size
. The num_seconds
variable there is a value that you should think of "in case of a backend outage, how many seconds do I want to buffer data before starting to drop?".
To override the sending queue values for logs collector, use otellogs.config.merge
option:
otellogs:
config:
merge:
exporters:
otlphttp:
sending_queue:
queue_size: <custom_size>
However, basing on the fact that we did not have problems with this particular option before, I'd also make sure that everything is fine in your cluster (the network works fine, the metadata layer pods don't crashloop etc.).
eks 1.28 chart v4.3.1 Sumo is mainly used to store eks logs.
Why
sending_queue.queue_size
is set to 10 by default, which is easily filled?I think we should set a reasonable default value to avoid losing logs.
How should I set it up to make sure 100% no loss of logs? I know this can be overridden in helm values.
Even though I read the fine-tuning manual, I don't know how to adjust the parameters because there are so many of them.
Do you have any recommendations for out-of-the-box parameters? Thank you. There may be nodes in a cluster with widely varying configurations. Some nodes have more logs, some have less.