aerospike / aerospike-client-csharp

Aerospike C# Client Library
70 stars 48 forks source link

Option to call Tend and Asynctimeout in threadpool #56

Closed devIonutApostol closed 4 years ago

devIonutApostol commented 4 years ago

The sdk creates 2 manual threads (tend and asynctimeout). I would prefer to move that logic to the threadpool, for example in some background tasks because in some cases even if we have a timeout of 300ms, I see that the asynctimeout thread gets CPU time every 3-4 seconds. Using cancellation tokens works ok, but still we want those actions on the threadpool.

Context: asp.net core, .net 5 rc1, high traffic (~12k IO per sec)

Should I make a PR ?

BrianNichols commented 4 years ago

I need to understand what problem that the threadpool would solve. How did you determine that the asynctimeout thread gets cpu time once every 3-4 seconds when it should be sleeping for only 300ms? Why does using the threadpool cause a thread to get more cpu time?

devIonutApostol commented 4 years ago

I will describe what happens. Usually happens 5-10 times a day , only for one random pod.

  1. We get a lot of timeouts error from aerospike client and the circuit breaker kicks in (1 minute open) causing the aerospike requests to drop to 0 but seems that there are still outgoing requests.
  2. After ~10 seconds other circuit breakers start to kick in. The healthcheck fails, we see some kubernetes cpu throttling .And some aerospike metrics like this aerospike issue connections_surge
  3. After ~10-15 seconds, the system recovers ( aerospike circuit breaker is still open, but other CBs are not).

Healthchecks on aerospike server side doesn't found anything wrong, we have 3 nodes.

Async Client Configuration AsyncClientPolicy aerospikePolicy = new AsyncClientPolicy { asyncMaxCommandAction = MaxCommandAction.REJECT, asyncMaxCommands = 1000, maxSocketIdle = 0, writePolicyDefault = { commitLevel = CommitLevel.COMMIT_MASTER, totalTimeout = 300, maxRetries = 1 }, readPolicyDefault = { totalTimeout = 300, maxRetries = 1 } };

As regarding of moving those threads on the threadpool, is more actually a good practice. We just want some kind of option to call those tend and asynctimeout actions.

devIonutApostol commented 4 years ago

We modified the AsyncTimeoutQueue Run method to log the frequency of running under heavy load.

modified

The result is something like this

AsyncTimeout run: elapsed milliseconds since last run: 317 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 328 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 2031 sleep interval: -1 AsyncTimeout run: elapsed milliseconds since last run: 321 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 318 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 1880 sleep interval: -1 AsyncTimeout run: elapsed milliseconds since last run: 300 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 301 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 322 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 2887 sleep interval: -1 AsyncTimeout run: elapsed milliseconds since last run: 321 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 330 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 2694 sleep interval: -1 AsyncTimeout run: elapsed milliseconds since last run: 320 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 327 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 2116 sleep interval: -1 AsyncTimeout run: elapsed milliseconds since last run: 324 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 308 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 1872 sleep interval: -1 AsyncTimeout run: elapsed milliseconds since last run: 301 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 331 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 2049 sleep interval: -1 AsyncTimeout run: elapsed milliseconds since last run: 302 sleep interval: 301 AsyncTimeout run: elapsed milliseconds since last run: 323 sleep interval: 301

After hacking something and moving this action on the threadpool, disabled that thread, same load

calling it like this _ = RunAerospikeTimeout();

method2 method3

yncTimeout run: elapsed milliseconds since last run 337 AsyncTimeout run: elapsed milliseconds since last run 325 AsyncTimeout run: elapsed milliseconds since last run 344 AsyncTimeout run: elapsed milliseconds since last run 343 AsyncTimeout run: elapsed milliseconds since last run 317 AsyncTimeout run: elapsed milliseconds since last run 317 AsyncTimeout run: elapsed milliseconds since last run 334 AsyncTimeout run: elapsed milliseconds since last run 304 AsyncTimeout run: elapsed milliseconds since last run 324 AsyncTimeout run: elapsed milliseconds since last run 332 AsyncTimeout run: elapsed milliseconds since last run 323 AsyncTimeout run: elapsed milliseconds since last run 323 AsyncTimeout run: elapsed milliseconds since last run 345 AsyncTimeout run: elapsed milliseconds since last run 357 AsyncTimeout run: elapsed milliseconds since last run 331 AsyncTimeout run: elapsed milliseconds since last run 305 AsyncTimeout run: elapsed milliseconds since last run 314 AsyncTimeout run: elapsed milliseconds since last run 311 AsyncTimeout run: elapsed milliseconds since last run 324 AsyncTimeout run: elapsed milliseconds since last run 347 AsyncTimeout run: elapsed milliseconds since last run 359 AsyncTimeout run: elapsed milliseconds since last run 346 AsyncTimeout run: elapsed milliseconds since last run 347 AsyncTimeout run: elapsed milliseconds since last run 340 AsyncTimeout run: elapsed milliseconds since last run 334 AsyncTimeout run: elapsed milliseconds since last run 312

BrianNichols commented 4 years ago

AsyncTimeoutQueue is performing as designed. If there are no pending timer requests, the cancelToken wait is set to -1 (infinite). When a new timer request is added, the cancelToken is woken up and the next cancelToken wait is set to "timeout + 1". The long sleeps occur because there are no timer requests during that time.

The threadpool alternative does not improve on the original design for AsyncTimeoutQueue or tend. I prefer the control of deadicated threads over the shared thread support in threadpool.

devIonutApostol commented 4 years ago

Thanks for clarifications, after further investigations it seems that I was wrong about this.