go-kit / kit

A standard library for microservices.
https://gokit.io
MIT License
26.35k stars 2.42k forks source link

metrics/dogstatsd: always reset all metrics before writing them when calling WriteTo #1231

Open skwair opened 2 years ago

skwair commented 2 years ago

Description

This PR fixes an issue that occurred when the Dogstatsd client could not reach the Datadog agent to send metrics. It would fail to send counters, return and never reset timings and histograms, resulting in an ever-increasing memory consumption while the client cannot reach the Datadog agent.

peterbourgon commented 2 years ago

If local state fails to get sent to Datadog and you reset it anyway, doesn't that lose information and invalidate your metrics?

skwair commented 2 years ago

It does, but it's coherent with the method's comment I suppose: "WriteTo abides best-effort semantics, so observations are lost if there is a problem with the write.". The current implementation already loses some information if you have multiple counters for example but you fail to send the first one, no?

We could chose to buffer these metrics while the connection is down (i.e.: only resetting when all writes are successful) instead but then we expose ourselves to the ever-growing memory issue if it lasts for too long.

peterbourgon commented 2 years ago

That's fair, I overlooked that caveat in the docs.

skwair commented 2 years ago

Is it ok for you or should I update the PR?

skwair commented 2 years ago

Hello, sorry for bumping this again, but we would need this fix on our side, any chance this gets merged? Or do you see another implementation for fixing this issue?

MaxymVlasov commented 1 year ago

@peterbourgon can you please take a look at this? It, as a dependency, affects an app in our production env for two months and a downgrade is not an option.

peterbourgon commented 1 year ago

I'm happy to merge with a test that fails on current master and passes on the branch.

skwair commented 1 year ago

Hello, thanks for your reply.

I added a test that fails without the fix as you asked. Let me know if it works for you.

MaxymVlasov commented 1 year ago

If I understand correctly, to notify @peterbourgon about comments, need to mention him

skwair commented 1 year ago

Hello @peterbourgon, did you have time to check if the tests added match what you were asking for?

ldez commented 1 year ago

@peterbourgon @ChrisHines @basvanbeek friendly ping