Implement batching of datagrams in UDPSink

casperisfine commented 3 years ago

This is kind of an alternative to https://github.com/Shopify/statsd-instrument/pull/257 (actually we could combine both, but...)

With a batch size of one, this is a bit slower than master, however with an arbitrary batch size of 50, it's about twice faster:

StatsD metrics to local UDP receiver (branch: master, sha: a9fae17)
                          5.142k (± 4.2%) i/s -     26.000k in   5.067016s

Comparison:
StatsD metrics to local UDP receiver (branch: udp-batching, sha: 0f436b2):    10458.0 i/s
StatsD metrics to local UDP receiver (branch: master, sha: a9fae17):     5141.5 i/s - 2.03x  (± 0.00) slower

Thread safety wise, I believe Concurrent::Array should have us covered (it's a regular Array on MRI, so little perf overhead).

What I'm not too sure about is the flushing. I'm worried some datagrams could stay in memory for quite long. The finalizer should ensure that they are sent at some point, however they could end up delayed a lot of a process stay idle for long with some datagrams.

So maybe it could make sense to combine this with the dispatcher thread PR, to have a background thread empty the queue every seconds or something.

@wvanbergen @dylanahsmith any thoughts?

wvanbergen commented 3 years ago

I am not a big fan of adding a runtime dependency, but if that avoids us from having the overhead of a mutex lock for every StatsD call, that may be ok.

wvanbergen commented 3 years ago

What I'm not too sure about is the flushing. I'm worried some datagrams could stay in memory for quite long. The finalizer should ensure that they are sent at some point, however they could end up delayed a lot of a process stay idle for long with some datagrams.

This also would cause issues when metrics are aggregated, because they will be aggregated in the timeframe when the batch is flushed, rather than when they were originally emitted. Not sure how big of an issue this will be.

So maybe it could make sense to combine this with the dispatcher thread PR, to have a background thread empty the queue every seconds or something.

Related the my comment above, as long as we flush at least every 10 seconds, the timeframe bucket in which a metric gets aggregated is at most a single bucket off (at least on DataDog), so flushing once per second should be more than frequent enough.

casperisfine commented 3 years ago

Thanks for the feedback Willem. I think I'll refactor all this to include a dispatcher thread.

casperisfine commented 3 years ago

I think I'll refactor all this to include a dispatcher thread.

Oh, one more question for you about that. Copy pasting from https://github.com/Shopify/statsd-instrument/pull/257#issuecomment-810572574

When forking all the thread but the main one are dead in the child process. So if we use a dispatcher thread, they need to be respawned after fork. Either automatically (e..g by checking Process.pid all the time, or by decorating Process.fork) or either by requiring the users to call some method.

I went for the later and introduced a .after_fork method for that purpose. But I'm open to implement "fork tracking" too. It would be easier to use, but libraries monkey patching core methods are not generally well received.

Any opinions on this? IMO checking Process.pid is a bad idea as it's rather slow on Linux (always do a syscall).

wvanbergen commented 3 years ago

Any opinions on this? IMO checking Process.pid is a bad idea as it's rather slow on Linux (always do a syscall).

The after_fork approach seems reasonable to me.

wvanbergen commented 3 years ago

Not a fully formed idea, but can we lazily start a dispatcher thread? Start the thread on the first packet that we emit. That could transparently deal with the fork issue.

casperisfine commented 3 years ago

I think this should be good to go.

I deployed it on a minor service to make sure it works alright.

Also I chose to make this batching entirely opt-in, because with the need of calling after_fork, I'm afraid many users would upgrade without noticing the need for calling after_fork. That could easily cause a situation where the dispatcher thread is dead post fork, so the datagrams would accumulate forever in the buffer. Metrics wouldn't be sent, and the app memory would grow forever.

Some quick benchmark:

$ benchmark/send-metrics-to-local-udp-receiver
Warming up --------------------------------------
StatsD metrics to local UDP receiver (branch: udp-batching, sha: 2575938)
                         1.351k i/100ms
Calculating -------------------------------------
StatsD metrics to local UDP receiver (branch: udp-batching, sha: 2575938)
                         13.476k (± 0.9%) i/s -     67.550k in   5.012945s

$ STATSD_FLUSH_INTERVAL=0.1 benchmark/send-metrics-to-local-udp-receiver
Warming up --------------------------------------
StatsD metrics to local UDP receiver (branch: udp-batching, sha: 2575938)
                         6.303k i/100ms
Calculating -------------------------------------
StatsD metrics to local UDP receiver (branch: udp-batching, sha: 2575938)
                         60.924k (±27.3%) i/s -    277.332k in   5.052112s

casperisfine commented 3 years ago

Sorry I missed you comment from yesterday.

Not a fully formed idea, but can we lazily start a dispatcher thread? Start the thread on the first packet that we emit. That could transparently deal with the fork issue.

Yes we can. From quick benchmarks, Thread#alive? isn't particularly expensive, barely more than a regular attribute access. Let me check the performance impact.

casperisfine commented 3 years ago

Let me check the performance impact.

Ok, so alive? is almost invisible in term of overhead, and it allow to forego the whole after_fork setup which I'm sure would cause tons of problems. So I'll go with that.

casperisfine commented 3 years ago

Should we add a test for a datagram that goes over the max?

Sure.

Should we add a test that makes sure we start a dispatcher after forking?

There's already a test that forks and then send a signal to the child for it to send a datagram from the signal handler.

casperisfine commented 3 years ago

Should we add a test for a datagram that goes over the max?

Done.

For now, let's keep the old unbuffered sink the default, and roll this out to an app to see how well it works.

Yep. Definitely the plan.

wvanbergen commented 3 years ago

Ship!

On Wed, Apr 7, 2021 at 10:14 Jean byroot Boussier @.***> wrote:

Should we add a test for a datagram that goes over the max?

Done.

For now, let's keep the old unbuffered sink the default, and roll this out to an app to see how well it works.

Yep. Definitely the plan.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Shopify/statsd-instrument/pull/280#issuecomment-814949944, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAD37TMCU6X4WTUTETBNLDTHRSFDANCNFSM42G7CD3Q .

casperisfine commented 3 years ago

I'll merge tomorrow morning unless @dylanahsmith has any comments.

casperisfine commented 3 years ago

Ok, merging now. There are a few good ideas, but I'd rather get this out of the door and get some data in production. We can always improve it more later.

Shopify / statsd-instrument

Implement batching of datagrams in UDPSink #280