Add Streams on top of relational database

veikkoeeva commented 9 years ago

Implementing an Orleans Stream Provider on top of a relational backend has come up as a subject a few times in Gitter, in https://github.com/dotnet/orleans/issues/623 and is connected to https://github.com/dotnet/orleans/issues/255. We could open an official issue about it as queuing, and streams, is a common scenario and especially on-premises development happens largely on top of relational backends. The following ideas are applicable to all relational databases, but to keep this simple, let’s assume SQL Server for the time being.

As mentioned in https://github.com/dotnet/orleans/issues/623 already, there’s an interesting article Using tables as Queues by Remus Rusanu. The ideas presented about FIFO queues look like sound plan to implement Orleans Streams of interest are the ordering guarantees. However, it might be prudent to explore the use of ANSI SQL 2003 SEQUENCE. This feature is widely available in various relational databases and also on SQL Server 2012 and later.

As of myself, I don't have a more concrete plan or schedule on starting this (though I've expressed interest), but a rough idea for the SEQUENCE part could be to mimick a ring buffer on a table. The table could something like following (might be inaccurate with regard to the needs of Orleans Streams, but the gist of it)

CREATE TABLE StreamRingBuffer
(
    [SequenceToken] BIGINT NOT NULL,
    [Slot] BIGINT NOT NULL,
    [StreamId] NVARCHAR(256),
    [EnqueuedOn] DATETIME2(7) NOT NULL,
    [ReferenceCount] TINYINT NOT NULL,
    [Payload] VARBINARY(MAX)
);

Here sequence token would be generated like

CREATE SEQUENCE EnqueueSequence AS BIGINT START WITH 1 INCREMENT BY 1 CACHE 500000;
SET @sequenceToken = NEXT VALUE FOR EnqueueSequence;
SET @slot = @sequenceToken % 500000;

This table could, or better perhaps should, be prefilled to avoid constant CPU churn on statistics, index updating, page allocation churn etc. If a variable size is needed, perhaps a feasible strategy would be to add a trigger that allocates a bigger table (as a note, perhaps doubling always the size, cf. amortization).

If I were to implement this, I would continue on the path of adding a few new queries to the database and use the same idea in code as with other tables.

veikkoeeva commented 8 years ago

I want to point Super Scaling Queues Using the LMax Disruptor Pattern which discussed apparently the same idea, puts out some improvements and most of all, has some very detailed analysis.

<edit: Also Super Scaling Queues Using the LMax Disruptor Pattern And The In-Memory OLTP Engine.

veikkoeeva commented 6 years ago

Further updates in the saga: Ring Buffer Shock Absorb-er Pattern and SQL Server.

normanhh3 commented 6 years ago

Has anyone considered SQL Server Service Broker functionality as a somewhat drop-in implementation? I'm sure there is a lot that I'm missing. Just curious at this point if the idea has been floated around before.

https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/sql-server-service-broker?view=sql-server-2017

galvesribeiro commented 6 years ago

@normanhh3 I think so... The problem was the support for Cloud offerings like Azure SQL who doesn't have it...

veikkoeeva commented 6 years ago

Plus the performance is -- or has been at least -- really, really abysmal. @normanhh3 You'd like to have this feature? :)

normanhh3 commented 6 years ago

If performance is an issue, then maybe not something I am interested in. I would be interested to hear of anyone trying it out though.

galvesribeiro commented 6 years ago

We discussed this a while back. I think the solution for that would be to use In-memory tables in OLtP engine with a very thin native-compiled procedure to insert and pool messages.

In orleans side, the provider would need logic for partition messages and pool them using an exponential backoff mechanism just like in EventHubs provider.

The problem with that approach is that you would need to foresee the partition number so you can create one pooling agent per partition. For some use cases it would be more than enough, but for other very hight throughput ones, it may be lead to contentions as the “lock” on the partition for a single reader(pooling agent) would not allow multiple readers to consume them in parallel.

veikkoeeva commented 6 years ago

@normanhh3 The performance isn't necessarily an issue, see the initial sketch of an idea using a ring-buffer like layout and the two linked articles which explore a similar one and take it even further.

What @galvesribeiro has mentions has been discussed also. You may not know this, @normanhh3, so a bit of a background, the partition number isn't the only issue, others are basically that the features aren't widely available for erveryone. But the system has been designed so that one can change the layout and queries used as long as the interface invariants hold, i.e. the queries are keyed, have an id, and use certain types of parameters with certain names. This would allow one to use more performant features when available and then also not. This is a bit done already in Storage (table) in SQL Server and even more when there was still support fro SQL Server 2000 (it required more changes).

The way I see forwards would be to build the Orleans side code and produce a performant general database layout with queries and possibly a special case one using in-memory tables. Or, say, someone doing a in-memory layout with compiled procedures and then adding a general one. One just needs to take care the interface between Orleans and the DB isn't unnecessarily constrained to some specific features.

There isn't specifically anything why one couldn't also modify the structures when siloes are working, like through the management system or through the injected storage drivers. I think this should be possible now with the new configuration system. One particular place I see this useful is that if I want to partition (as in have a new filegroup, maybe schemas, to split Storage into Storage1, Storage2 etc. by some parameters, which can be types too) dynamically (and maybe grant the Orleans process only temporarily the rights to do such things on the DB). In fact, this sort of splitting and using a different schema and filegroup, maybe also encrypt data, would be a useful measure from the GDPR perspective to split sensitive data. Albeit it could be done otherwise too.

It just happens that when I had a bit more time, the code-side was in the flux so I waited a bit, I might have time this summer, I'm not sure, it depends how much there's on the code-side to work (which files to look at, @jason-bragg ?). Maybe what I'm trying to tell too is that this isn't too difficult, but I've got myself mixed to some busy day jobs and then some startup stuff. :)

normanhh3 commented 6 years ago

Thank you for the detailed conversation, the relevant links and generally enlightening discussion.

This has proved useful already.

Tangential question, what is the philosophy for creation of DB-specific optimizations? For instance, indexes on the Storage table.

Our schema version appears to have not included any indexing. Is that just considered a local optimization?

On Sun, Jun 17, 2018, 03:33 Veikko Eeva notifications@github.com wrote:

@normanhh3 https://github.com/normanhh3 The performance isn't necessarily an issue, see the initial sketch of an idea using a ring-buffer like layout and the two linked articles which explore a similar one and take it even further.

What @galvesribeiro https://github.com/galvesribeiro has mentions has been discussed also. You may not know this, @normanhh3 https://github.com/normanhh3, so a bit of a background, the partition number isn't the only issue, others are basically that the features aren't widely available for erveryone. But the system has been designed so that one can change the layout and queries used as long as the interface invariants hold, i.e. the queries are keyed, have an id, and use certain types of parameters with certain names. This would allow one to use more performant features when available and then also not. This is a bit done already in Storage (table) in SQL Server and even more when there was still support fro SQL Server 2000 (it required more changes).

The way I see forwards would be to build the Orleans side code and produce a performant general database layout with queries and possibly a special case one using in-memory tables. Or, say, someone doing a in-memory layout with compiled procedures and then adding a general one. One just needs to take care the interface between Orleans and the DB isn't unnecessarily constrained to some specific features.

There isn't specifically anything why one couldn't also modify the structures when siloes are working, like through the management system or through the injected storage drivers. I think this should be possible now with the new configuration system. One particular place I see this useful is that if I want to partition (as in have a new filegroup, maybe schemas, to split Storage into Storage1, Storage2 etc. by some parameters, which can be types too) dynamically (and maybe grant the Orleans process only temporarily the rights to do such things on the DB). In fact, this sort of splitting and using a different schema and filegroup, maybe also encrypt data, would be a useful measure from the GDPR perspective to split sensitive data. Albeit it could be done otherwise too.

It just happens that when I had a bit more time, the code-side was in the flux so I waited a bit, I might have time this summer, I'm not sure, it depends how much there's on the code-side to work (which files to look at, @jason-bragg https://github.com/jason-bragg ?). Maybe what I'm trying to tell too is that this isn't too difficult, but I've got myself mixed to some busy day jobs and then some startup stuff. :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/orleans/issues/634#issuecomment-397860507, or mute the thread https://github.com/notifications/unsubscribe-auth/ABj7xLMTdxG_5B-inRrvbQietMNWslyAks5t9gZWgaJpZM4FdmiM .

veikkoeeva commented 6 years ago

@normanhh3 It's written in the scripts: everything goes as long as the names and types in queries remain the same. You can add indices, for example, at your leisure. Do you have specific index in mind for Storage?

veikkoeeva commented 4 years ago

Cross-referencing due to expected simplification in Orleans side code: https://github.com/dotnet/orleans/pull/6711.

dotnet / orleans

Add Streams on top of relational database #634