Closed mhratson closed 2 years ago
Hi @mhratson,
The client will pack multiple messages into a buffer that is then sent to the Agent. That way we send multiple DSD metrics/events/service_checks at once to the Agent. This error occurs when a buffer is full. The error is then catched which triggers a flush of the current buffer to the sender. The worker then pulls a new buffer from the internal buffer pool and adds the message that didn't fit in the previous buffer to the new empty one.
Therefore this error should never surface to the user unless your service check is larger than a maximum buffer size.
The maximum size of a buffer is equal to WithMaxBytesPerPayload
value which default to 1432 bytes for UDP and named pipe and 8192 bytes for UDS (see this documentation). If you increase this value you need to mimic the same change on the Agent side by setting dogstatsd_buffer_size
in the datadog.yaml to a value equal or higher (see this documentation and be careful about packets fragmentation).
In your use case I would first double check why your service check is so large (which I think is the main issue). Also if you're using UDP
try to move to UDS
(which also offers better performances).
And lastly for your main question: there is no reason to wait, with your current configuration the service check will never fit and it doesn't mean the DogStatsD client is full.
I agree that the error message can be misleading for users, I'll update it in the next version. Thanks for bringing this up !
Therefore this error should never surface to the user unless your service check is larger than a maximum buffer size.
Yeah, in which case user has to handle it and keeping it unexported doesn't help as there's no way to compare the error.
While it's not a big problem and caller can still compare error strings it's a fragile approach that I think still worth noting/improving.
As well as documenting thee limits in ServiceCheck
https://github.com/DataDog/datadog-go/blob/496987906cfdbe16b66fdf01bdacc618958e6ab4/statsd/service_check.go#L22-L37
Thanks!
I opened a PR regarding the error wording: https://github.com/DataDog/datadog-go/pull/252
How do callers have to handle statsd buffer is full error?
ATM a retry loop does the job, but I wonder if i'm missing anything since the error is private and presumably not supposed to bubble up all the way to the caller.
Thank!