API proposal : pooled / leased connections as a secondary API

mgravell commented 6 years ago

Motivations:

provide safe access to blocking operations (BRPOPLPUSH, etc)
provide "pure" access to WATCH/MULTI/EXEC
we already expose the user to the risk of these operations via Execute; this would hopefully stop people doing dangerous things on the main connections

Counter-motivations:

Lua scripts are better solutions for most (but not all) scenarios that previously would have been WATCH/MULTI/EXEC

Note:

If implemented (needs consideration), the multiplexer would take care of the socket IO, but they would be stored separately and would not participate in the multiplexed command stream. This would be in addition to the regular API and would just work inside the existing context.

Questions:

what does this mean for cluster? is it assumed that all operations on a lease will participate on a single shard and will therefore be single-server?
how is primary/replica selected
what does the lease API look like - presumably using?
what happens when the pool is exhausted? is it an "await (async) an item from the pool, or timeout after x"?
where would it attach; IServer ? and if so, do we need a new GetServer mechanism that makes this painless?
how complex does the pool management need to be? just a max? is it enabled at all if not configured?
do timeouts apply on blocking operations? if so, how?

Timescales:

It won't be 2.0

Likelihood:

Thinking about it; it is tempting.

mgravell commented 6 years ago

Usage idea:

// args for sharding and primary/replica selection (key is just
// representative - and optional?)
IServer server = conn.GetServer(sampleKey, flags); 

// other args here not shown: db number, async state, etc
using(ILeasedDatabase db = await server.LeaseAsync(timeout)) 
{
  // stuff
  await db.SomeBlockingOperationAsync(..);
  // more stuff
}

Or maybe just one-step that?

using(ILeasedDatabase db = await conn.LeaseAsync(sampleKey, flags, timeout))
{
   // ...
}

with:

interface ILeasedDatabase : IDatabase, IDisposable
{
    // TODO: add blocking APIs

    // TODO: add "pure" WATCH/EXEC APIs
}

NickCraver commented 6 years ago

For clarity: is the concern mainly client (pipeline) blocking operations due to large load, or things that are meant to be server-blocking (atomic), the latter of which are going to block due to the nature of redis anyway? I'm assuming all the wins are in the former, in which case: ...maybe? But I see less utility for it.

That being said, most of our direct concerns in this area (around connections) at Stack fall info:

Lots of data in an op (e.g. a large string, like with leagues)
Lots of pipe contention for massive op counts (e.g. API throttles)

I'm not sure how it'd help the second case, but a similar approach as our "bulky" could be natively handled by the former. But, counter to that use case, is it worth automatic control and generation of bulk connections over our explicit usage today? I'm leaning towards "it's not", due to increased complexity and uncertain expectations vs. today. A single "bulky" on-demand (in addition to infrastructure and subscription) connections could satisfy that, but it's such a narrow use case...again, worth it?

I think the biggest issue I have is this changing from a very predictable 2 connections to n and breaking all expectations. That worries me for our use case...but I totally admit we're a crazy outlier and that's likely not an issue in general. To that end, I'd say: we'd want to be able to disable such behavior...but then we don't be dogfooding it either.

...I'm not sure I helped here, but that's current thoughts after a first pass.

mgravell commented 6 years ago

@NickCraver the motivation isn't data volume / load related; it is semantic related. There are a good few operations that are designed to be connection-blocking, so we don't expose them today. Similarly, there are some WATCH scenarios that we can't implement, because they require ad-hoc commands inside a WATCH region (but before a MULTI), which we disallow.

aensidhe commented 6 years ago

IDatabase is compatible with IDatabaseAsync. I can shoot myself into leg passing ILeasedDatabase into usual async code.
Why do we need separate API for creating additional connection? Why creating a separate multiplexer and caching it is worse? Chances are that I'll need it several times during lifetime of my web code. If code is in some tool, I can create as many multiplexers as I want.

My opinion that this API is unnecessary.

PS: yes, I can still shoot myself into leg by passing IDatabaseAsync from "special multiplexer" into usual code, but that is ok for me.

NickCraver commented 6 years ago

@mgravell I'm not sure I understand the intent of where you're picturing the leasing/blocking. It's a close in concept to a transaction (IMO), or (the way I read it) we'd be blocking every command on the pipe behind that using, effectively making it lock() semantics w.r.t. the pipeline. Can you elaborate on the examples a bit more on where you expect the actual leasing/blocking to start and end? If it's not on a separate connection (pooled, dedicated, whatever), which most of my concerns are around, we're still back to blocking the existing primary connection (the primary reason, AFAIK, we disallow these today). Am I way off here?

mgravell commented 6 years ago

@aensidhe

doesn't shoot you in the leg - it just means it runs on a separate connection
because establishing connections and doing all the handshaking is more expensive than you might think; and also: we don't currently expose those APIs for you to use

mgravell commented 6 years ago

@NickCraver the leased connection here is a dedicated socket - it is a leased redis connection, essentially; "interactive" only (no subscription functionality); we would hand it back to some notional pool when disposed, etc (so from a redis perspective: it stays alive)

NickCraver commented 6 years ago

Okay so if I do this:

db.StringSet("mykey", "value");
using(ILeasedDatabase db = await server.LeaseAsync(timeout)) 
{
  var val = await db.StringGet("mykey");
}

...then val can be null, because we're got 2 connections and physical pipelines - so if the first is backed up a bit, it won't have issued the write to redis before the GET on the second, right? This is less likely to happen if using the shared SocketManager writer pools, but still. I think there's a lot of non-intuitive behaviors that could arise here: races, exhaustion of the pool (in loops/concurrency) or unexpected timeouts (assuming that's a concept that applies here), and other related bits when using these with any decent frequency.

Thoughts on those kinds of things?

aensidhe commented 6 years ago

@mgravell

Yeah and that can be unexpected. Can't think about anything bad right now, so ok, let it be ok.
I know about cost of establishing of connection, but why is pooling connection is better than thinking ahead and putting an additional multiplexer ahead of time into DI container or in separate global variable, etc? This is very edge use case. Will you recycle pooled connection? How and when will you do it?

It's not a "user way of think", I'm writing a similar package for another database. And I think that if user want a separate connection from time to time he can cache that separate connection somewhere near main connection.

And other concern: IDatabase and IDatabaseAsync are lightweight wrappers to get and throw away here and there, if I'm remember it correctly. Making a compatible interface that can create a connection per allocation of that interface can be a problem.

mgravell commented 6 years ago

Okay so if I do this:

No, there is no conflict there; StringSet blocks until the response comes back from the server; likewise await StringSetAsync. There is a thread-race if you do:

var x = dbA.StringSetAsync();
var y = dbB.StringGetAsync();
await x;
var s = await y;

but.... at that point you've gone out of your way to introduce a problem (@NickCraver )

NickCraver commented 6 years ago

Ah sorry I should amend the example (was on mobile), I don't think the FireAndForget is that uncommon/going out of the way - I think that's fairly standard fare on a set operations.

I'm just not (yet) convinced we've gotten enough feedback wanting this to justify spending much time on it, but I don't feel strongly against either. FWIW, some of the feedback like WAIT was around wanting things on that pipeline to finish (which is a separate issue from this anyway). I can recall a few issues related here, but IIRC they were all isolated.

My main concern is still around any sort of automatic socket pooling which makes things less predictable. That would feel (at this particular moment) like a step backwards. Right now failures there are failures on that multiplexer and it's not too hard to reason about what went wrong. When 1 (or many) in a pool of several start having issues, things get messy to figure out...that's the part that scares me, given we just got sockets decently stable and are still working on hangs (which to be fair may be unrelated).

To be fair, it's entirely possible after we're stable on v2 I'd feel a lot more confident in diving into this with a different perspective on the complexity/debug risk vs. payoff.

mgravell commented 6 years ago

Fair enough; totally agree that it isn't a priority today - happy to shelve indefinitely, I just want to open the dialogue.

Plasma commented 6 years ago

Just wanted to chime in as to why having a separate connection, even for non-blocking operation use, can make sense some times.

At the moment the recommendation is to use a single, shared ConnectionMultiplexer across all threads.

This usually works well, but I've observed these side effects in production:

1) Because all commands from all threads go to a single C# queue and TCP socket, you get "head of line blocking" behind a buffer that may be filled with more commands from Thread A (perhaps megabytes in terms of bandwidth to process) in front of the one you're wanting to transmit that is more time sensitive, and smaller in size, from Thread B.

Imagine for example, Thread A has written several megabytes worth of SET commands due to a background task, and Thread B has a simple GET command pending at the end of the shared muxer queue. Thread B must wait for Thread A's workload to be emptied first.

The delays I am referring to here are milliseconds, and in one case 100s of milliseconds, due to CPU heavy redis commands used (eg sorted set intersects), but I'm mostly pointing it out in terms of "jitter" observed for random requests.

2) I acknowledge that Redis is single threaded, and processes requests sequentially, but it is worth noting that Redis attempts to treat each client/TCP connection fairly [1].

3) Imagine we utilized two connections ("clients") to Redis instead, and Thread A is doing its bandwidth-heavy, batch based processing on one connection, and Thread B uses another connection that is intended to be for more real-time request/response patterns.

If both are sent at the same moment, and my assumption is the local app server's TCP framework will also do interleaving, then Redis will switch/interleave processing requests from the TCP sockets between both connections, so while partially receiving/processing Thread A's workload (the multi-megabyte SET commands), Redis will process Thread B's simple workload quickly, then switch back to continuing to process Thread A workload.

The end result here is that Thread B gets a faster response time as its connection's workload was interleaved, on Redis's side, because it was a separate TCP connection, and it wasn't behind the workload of Thread A by sharing the same TCP socket or local C# command queue.

[1] https://redis.io/topics/clients

mgravell commented 6 years ago

Indeed. This is a scenario where we currently just spin up a second parallel multiplexer - one for large slow ops, one for fast traffic. Just mentioning that because it is a simple pragmatic low effort solution to the same problems.

On Thu, 26 Jul 2018, 01:23 Andrew Armstrong, notifications@github.com wrote:

Just wanted to chime in as to why having a separate connection, even for non-blocking operation use, can make sense some times.

At the moment the recommendation is to use a single, shared ConnectionMultiplexer across all threads.

This usually works well, but I've observed these side effects in production:

Because all commands from all threads go to a single C# queue and TCP socket, you get "head of line blocking" behind a buffer that may be filled with more commands from Thread A (perhaps megabytes in terms of bandwidth to process) in front of the one you're wanting to transmit that is more time sensitive, and smaller in size, from Thread B.

Imagine for example, Thread A has written several megabytes worth of SET commands due to a background task, and Thread B has a simple GET command pending at the end of the shared muxer queue. Thread B must wait for Thread A's workload to be emptied first.

The delays I am referring to here are milliseconds, and in one case 100s of milliseconds, due to CPU heavy redis commands used (eg sorted set intersects), but I'm mostly pointing it out in terms of "jitter" observed for random requests.

1.

I acknowledge that Redis is single threaded, and processes requests sequentially, but it is worth noting that Redis attempts to treat each client/TCP connection fairly [1]. 2.

Imagine we utilized two connections ("clients") to Redis instead, and Thread A is doing its bandwidth-heavy, batch based processing on one connection, and Thread B uses another connection that is intended to be for more real-time request/response patterns.

If both are sent at the same moment, and my assumption is the local app server's TCP framework will also do interleaving, then Redis will switch/interleave processing requests from the TCP sockets between both connections, so while partially receiving/processing Thread A's workload (the multi-megabyte SET commands), Redis will process Thread B's simple workload quickly, then switch back to continuing to process Thread A workload.

The end result here is that Thread B gets a faster response time as its connection's workload was interleaved, on Redis's side, because it was a separate TCP connection, and it wasn't behind the workload of Thread A by sharing the same TCP socket or local C# command queue.

[1] https://redis.io/topics/clients

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/StackExchange/StackExchange.Redis/issues/886#issuecomment-407937299, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDsP40_AnNeiNYdk0NNb30SfS6m5uYks5uKQwdgaJpZM4VKHPu .

prat0088 commented 4 years ago

Is this still on the table? Is there anything I can help out with?

mgravell commented 4 years ago

"yes" and "probably not yet, later: definitely", in that order.

On Sat, 15 Feb 2020, 20:52 Tristan Pratt, notifications@github.com wrote:

Is this still on the table? Is there anything I can help out with?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/StackExchange/StackExchange.Redis/issues/886?email_source=notifications&email_token=AAAEHMFK6J56UOHHX2BGYHLRDBIZVA5CNFSM4FJIOPXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL3WWQA#issuecomment-586640192, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMAUUJDRGO73Z43KI6DRDBIZVANCNFSM4FJIOPXA .

DeepWorksStudios commented 11 months ago

Any updates?

StackExchange / StackExchange.Redis

API proposal : pooled / leased connections as a secondary API #886