Closed casperisfine closed 3 years ago
I am not a big fan of adding a runtime dependency, but if that avoids us from having the overhead of a mutex lock for every StatsD call, that may be ok.
What I'm not too sure about is the flushing. I'm worried some datagrams could stay in memory for quite long. The finalizer should ensure that they are sent at some point, however they could end up delayed a lot of a process stay idle for long with some datagrams.
This also would cause issues when metrics are aggregated, because they will be aggregated in the timeframe when the batch is flushed, rather than when they were originally emitted. Not sure how big of an issue this will be.
So maybe it could make sense to combine this with the dispatcher thread PR, to have a background thread empty the queue every seconds or something.
Related the my comment above, as long as we flush at least every 10 seconds, the timeframe bucket in which a metric gets aggregated is at most a single bucket off (at least on DataDog), so flushing once per second should be more than frequent enough.
Thanks for the feedback Willem. I think I'll refactor all this to include a dispatcher thread.
I think I'll refactor all this to include a dispatcher thread.
Oh, one more question for you about that. Copy pasting from https://github.com/Shopify/statsd-instrument/pull/257#issuecomment-810572574
When forking all the thread but the main one are dead in the child process. So if we use a dispatcher thread, they need to be respawned after fork. Either automatically (e..g by checking
Process.pid
all the time, or by decoratingProcess.fork
) or either by requiring the users to call some method.I went for the later and introduced a
.after_fork
method for that purpose. But I'm open to implement "fork tracking" too. It would be easier to use, but libraries monkey patching core methods are not generally well received.
Any opinions on this? IMO checking Process.pid
is a bad idea as it's rather slow on Linux (always do a syscall).
Any opinions on this? IMO checking Process.pid is a bad idea as it's rather slow on Linux (always do a syscall).
The after_fork
approach seems reasonable to me.
Not a fully formed idea, but can we lazily start a dispatcher thread? Start the thread on the first packet that we emit. That could transparently deal with the fork issue.
I think this should be good to go.
I deployed it on a minor service to make sure it works alright.
Also I chose to make this batching entirely opt-in, because with the need of calling after_fork
, I'm afraid many users would upgrade without noticing the need for calling after_fork
. That could easily cause a situation where the dispatcher thread is dead post fork, so the datagrams would accumulate forever in the buffer. Metrics wouldn't be sent, and the app memory would grow forever.
Some quick benchmark:
$ benchmark/send-metrics-to-local-udp-receiver
Warming up --------------------------------------
StatsD metrics to local UDP receiver (branch: udp-batching, sha: 2575938)
1.351k i/100ms
Calculating -------------------------------------
StatsD metrics to local UDP receiver (branch: udp-batching, sha: 2575938)
13.476k (± 0.9%) i/s - 67.550k in 5.012945s
$ STATSD_FLUSH_INTERVAL=0.1 benchmark/send-metrics-to-local-udp-receiver
Warming up --------------------------------------
StatsD metrics to local UDP receiver (branch: udp-batching, sha: 2575938)
6.303k i/100ms
Calculating -------------------------------------
StatsD metrics to local UDP receiver (branch: udp-batching, sha: 2575938)
60.924k (±27.3%) i/s - 277.332k in 5.052112s
Sorry I missed you comment from yesterday.
Not a fully formed idea, but can we lazily start a dispatcher thread? Start the thread on the first packet that we emit. That could transparently deal with the fork issue.
Yes we can. From quick benchmarks, Thread#alive?
isn't particularly expensive, barely more than a regular attribute access. Let me check the performance impact.
Let me check the performance impact.
Ok, so alive?
is almost invisible in term of overhead, and it allow to forego the whole after_fork
setup which I'm sure would cause tons of problems. So I'll go with that.
- Should we add a test for a datagram that goes over the max?
Sure.
- Should we add a test that makes sure we start a dispatcher after forking?
There's already a test that forks and then send a signal to the child for it to send a datagram from the signal handler.
- Should we add a test for a datagram that goes over the max?
Done.
For now, let's keep the old unbuffered sink the default, and roll this out to an app to see how well it works.
Yep. Definitely the plan.
Ship!
On Wed, Apr 7, 2021 at 10:14 Jean byroot Boussier @.***> wrote:
- Should we add a test for a datagram that goes over the max?
Done.
For now, let's keep the old unbuffered sink the default, and roll this out to an app to see how well it works.
Yep. Definitely the plan.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Shopify/statsd-instrument/pull/280#issuecomment-814949944, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAD37TMCU6X4WTUTETBNLDTHRSFDANCNFSM42G7CD3Q .
I'll merge tomorrow morning unless @dylanahsmith has any comments.
Ok, merging now. There are a few good ideas, but I'd rather get this out of the door and get some data in production. We can always improve it more later.
This is kind of an alternative to https://github.com/Shopify/statsd-instrument/pull/257 (actually we could combine both, but...)
With a batch size of one, this is a bit slower than master, however with an arbitrary batch size of 50, it's about twice faster:
Thread safety wise, I believe
Concurrent::Array
should have us covered (it's a regular Array on MRI, so little perf overhead).What I'm not too sure about is the flushing. I'm worried some datagrams could stay in memory for quite long. The finalizer should ensure that they are sent at some point, however they could end up delayed a lot of a process stay idle for long with some datagrams.
So maybe it could make sense to combine this with the dispatcher thread PR, to have a background thread empty the queue every seconds or something.
@wvanbergen @dylanahsmith any thoughts?