[FEATURE] Backplane for SQL Server distributed cache

RemarkLima commented 1 year ago

Currently the Backplane feature is only available for Redis cache.

Would it be much work to get the same setup available when using SQL Server as the distributed cache?

jodydonetti commented 1 year ago

Hi @RemarkLima and thanks for using FusionCache!

Let me think about what that would mean from a design & impl perspective, mainly because SQLServer does not have native primitives for things like a bus, messages, etc so it would mean creating such a system from scratch on top of plain SQLServer itself.

While we're on the subject, you don't want to use Redis because it would be an additional service to create (and pay), because you don't like Redis or are there other reasons? I'm asking because if you are open to other Azure services, maybe we can think about an impl on top of Azure Message Bus or similar, for example: this would mean a real message bus and not something "simulated" on top of plain SQLServer.

Let me know, thanks!

Nisden commented 1 year ago

@jodydonetti if @RemarkLima is talking about on-premises SQL Server it actually has a native message broker https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/sql-server-service-broker

jodydonetti commented 1 year ago

Thanks @Nisden for pointing that out! In this case though, they specified to me (via Twitter dm) that it was about Azure SQLServer, so I think it wouldn't apply in this specific case.

Anyway, instead of building a message bus or similar on top of SQLServer from scratch, I'm thinking more about using an existing OSS project that does that, and create a package that would integrate them together, like I already did with Redis pub/sub.

Does anyone have an existing project to use for that? I'm open to ideas!

Nisden commented 1 year ago

@jodydonetti Sadly no, but implementing something that could act as a backplane on SQLServer for FusionCache seems fairly simple as your requirements are pretty light.

A single table with (Id, TimeStamp, Action, CacheKey) and then a loop looking for new messages should be able to solve it. It would of course be pulling instead of pushing, but that seems like a small sacrifice for the simplicity.

(Have done a little research because I am considering building a Backplane for Azure Storage)

RemarkLima commented 1 year ago

Hi @jodydonetti @Nisden

Thanks for picking this up. As said via DM our case is for Azure SQL, however from a distributed cache point of view, it seems that .NET knows no difference between Azure SQL and on prem SQL, hence my thinking the implementation would be the same... But maybe not.

The reason the project is shying from Redis is that it's pretty small, and the cost of Redis on Azure is too painful ;)

The idea from @Nisden about the lookup would work as a simple solution. It'd not be that different to our internal setup to check the live database that the cache data is up to date for critical actions. I guess the downside would be a database hit on every get or getorset request...

A basic v1 with that setup for now, would be great.

jodydonetti commented 1 year ago

While confronting myself with some friends in the field, looking for some already existing solutions for a generic message with support for different storage support, Rebus came up.

I've already heard of it, and it seems to be a very nice package: I've never worked with it, so I'm taking some time to look at it to see if it can be a nice fit for us.

Basically, by creating an implementation of IFusionCacheBackplane that works with Rebus we'll be automatically open to all of its available transport implementations like SQLServer, MySql, RabbitMQ, AzureServiceBus, and so on.

I think it can be a very nice solution.

Opinions?

RemarkLima commented 1 year ago

It looks like a great find, and a working, supported integration, and looks like it's been around for a while... It had me sold on "the best error messages in the world"!

As you'd said, leveraging an existing library makes the most sense.

marafiq commented 1 year ago

Some observations though unsolicited but might be helpful.

Not sure that your SQL Server Edition or Azure SQL supports InMemory Sql Server tables which are great use case for Cache. I know it's supported in Enterprise edition 2017+. SQL Server has high perf in memory table feature, also with durable version.

Polling the table to detect invalidation worked great in Asp.Net 2.0 era, so it should work in your use case. If you can use in memory tables, it might even beat Redis in speed.

I feel like including full transport level messaging in this pkg might complicate it. Configurable background polling might be the answer.

https://adnanrafiq.com/blog/Introduction-to-MSSQL-Server-In-Memory-OLTP-with-NET6-EF-Core/

How it was done in asp.net 2.0

http://books.gigatux.nl/mirror/asp.net/8877final/LiB0082.html

moggoly commented 1 year ago

Just wondering if this is still being considered, or has it been dropped completely?

jodydonetti commented 1 year ago

Hi @moggoly , and thanks for considering FusionCache!

So, theoretically it's still in the backlog, but honestly I'm not that sure about it, and here's why.

I've been able to talk with the creator of Rebus, and he's been kind enough to spend some time with me about this and he thinks this is not a good use case for Rebus, and I should better not go down this path.

Then I had a talk with some people quite expert on NServiceBus & similar solutions, and they all told me it's not really a good use case either.

The last thing remaining is to code it myself from scratch, but the complexity of implementing a pubsub system on top of an rdbms in a reasonable way is definitely non trivial, and the hypothetical pool of users probably very small (would be interested to know otherwise!).

So here we are: I haven't abandoned the idea completely, but also there have been features definitely more important to go after.

Out of curiosity, would you mind sharing your use case, in what scenario you would use it (how many nodes? what amount of data? etc), what version of SqlServer you are using, why you can't use something like Redis, and what would you expect from this implementation? I mean like perf wise etc. If you are willing to let me understand better it would be useful.

Thanks!

moggoly commented 1 year ago

Hi @jodydonetti , thanks for the quick response.

So we're a small software provider working wholly on the .NET stack utilising Azure for all hosting services. We're already using a small amount of in-memory caching in our web app, but obviously with the Azure pricing we do have a lot of small apps running on top of a relatively small server instance, so anything we can do to bring memory usage down would be a bonus. We're heavily data-based, so our Azure SQL database (NOT managed instance) is in continual use. The reason why we can use something like Redis - it's purely down to costing. It's far too expensive on Azure for not a lot of return overall. Hence why using SQL as a backplance for even small amounts of caching would help us to improve performance.

Hope that's of some help. I think you'll be surprised at how many others won't use Redis because of the pricing too.

martinothamar commented 1 year ago

Hi! I'm in a similar but slightly different situation to @moggoly. I'm on a team with developers only, as such we try to limit the operational burden as much as possible. This is a fairly large org though, so there is a department setup to manage SQL Servers on Azure on behalf of application dev teams, which we make use of for application DB needs.

On the topic of messaging/pubsub on top of SQL Server, we actually use the outbox pattern as part of our application, so we have some familiarity with the awkwardness of implementing messaging on top of SQL Server (we do polling on a "Outbox" table in the message relay component of it).

In our case, we use FusionCache to decouple ourselves from a thirdparty service which is highly unreliable and slow. It's very little data (KBs) and the rate of change is also small, so we would not be worried about performance at all.

In conclusion, I would find this useful, but I definitely understand your hesitation given the implementation cost

jodydonetti commented 1 year ago

Hi @moggoly and @martinothamar , thanks for sharing! It's very interesting to read these real-world experiences, practical needs and limitations: I'll try to deepen the subject more and see if I can come up with an implementation on SqlServer, at this point from scratch I think.

For anyone else reading this (or if you want to share this with someone): if you have similar experiences or needs, please share them here and I'll read them carefully.

Will update, thanks again.

rickdotnet commented 1 year ago

Hey all, I've been following along and may take a stab at something similar. While I'm tinkering, I have incentive to consider Sql Server as well for some of my use cases.

Although, that bit is boring, I'd like to offer up two bits of technology that I think can be related to this discussion.

CAP - I've just started exploring this library; it does support Sql Server, may be worth investigating
NATS - Consider adding this technology to your stack. This is the reason I'm here, but, I may stay for Sql Server.

jodydonetti commented 1 year ago

Hi @rickdotnet , and thanks for the suggestions: I remember taking a look at both of them some times ago.

Regarding NATS I thought that is something one day I'd like to study more and add support for as a backplane implementation.

Regarding CAP instead, I remember taking a look at it quickly, but kinda paused when I saw the attribute-based nature of the subscription registration, which is something I can't do declaratively at compile-time but only programmatically at runtime.

What I mean is this:

Do you know if there's a way to do that programmatically?

Thanks!

rickdotnet commented 1 year ago

That is my exact reservation with the library as well. I have on my list, to look at how easy it would be to extend the library and potentially wrap their library with a better feeling surface API. And who knows, maybe the functionality is there and it's just not documented well. I'll be looking soon though.

I'm diving pretty hard into NATS as a potential foundation for our modern approach to distributed messaging. It makes sense for me to also look into the key/value capability of it too.

Hopefully a hodgepodge of my tinkering can help out here. And with that, I'm back on topic. 😂

Edit:

After looking into CAP more it doesn't look like a good out of the box fit.

RemarkLima commented 1 year ago

Hi @jodydonetti

Sorry I missed, just to add to the reason for wanting to use SQL server, it's similar to the above for me. We use Azure, and with that Azure SQL server.

Adding Redis adds another point of contact, complexity and development skill. But also the costs can ramp up pretty quickly in Azure which could be a show stopper for some smaller clients.

Generally, we're not working with massive volumes of site visitors, so outright performance isn't a concern per se, but a balance of cost and performance is.

Hope that makes sense.

jasenf commented 1 year ago

Speaking from a little bit of experience here: SQL Server is terrible at pub/sub events. It's not what the database server was built for and performance will start suffering. Azure SQL is even worse when you start factoring in connection reliability and the fact that most SQL client libraries aren't built with a long term connection in mind which are needed for pub/sub relationships.

I'd highly recommend you bite the bullet and purchase the lowest Redis server on Azure, I think it's less than $20/month. You won't ever look back, I promise.

rickdotnet commented 1 year ago

Agreed. The NATS backplane was on my list simply to be an alternative to Redis and because I'm obsessed with that server right now. It's a container instance away.. very Azure friendly and super easy to work with.

I've opted to use NATS as my distributed cache as well. But, you could technically use a NATS backplane with your Sql Server distributed cache. If I'm reading the docs right, @jodydonetti just 'evicts' the entry once NATS tells it to. Then, when you request a copy of it, it grabs a fresh one. When I get a burst of energy I'll get something posted.

rickdotnet commented 1 year ago

I took a stab at the NATS backplane and tested it with a SqlServer distributed cache. The preliminary test went well, but I'm fairly new to both of these code bases, so, expect some oddities. I plan to shoot the link to some of the NATS guys. Maybe they can take a glance to make sure I'm not doing anything silly. Hopefully, Jody or someone more familiar with FusionCache can also help on that side.

I hope this gets you closer to a non-redis option. I provide a crude example application in my fork. I plan to add a small readme as well to aid in testing it, but if anyone in this conversation up to this point wants access to my hosted dev NATS machine, I'll shoot you the details. I did these tests with a local SqlServer and a docker container.

jodydonetti commented 1 year ago

Hi @rickdotnet , wow thanks for the effort 😲

I'll take a look at the code as soon as I'll get out from the rabbit hole I'm currently very deep in: rewrite of auto-recovery + backplane, to get a massive perf boost + more robustness + more edge cases handled + extra stuff. Distributed computing is quite challenging, eheh 😅

My only "concern" about this related to this thread's OP request is that yes, it would be an alternative to Redis, but it still wouldn't be based on an existing SqlServer instance so an extra server would be needed nonetheless, whereas OP request here were to just use the existing SqlServer instance. I'm saying this because I can already see OP and the others saying that if an extra server is anyway needed, than they may just spin one up and run Redis on that without the extra cost of a managed Redis service.

Having said that I still wanted a NATS based implementation of the backplane, and this may very well be it, or at least a good starting point we can tweak here and there, which is awesome!

Thanks again, will let you know!

rickdotnet commented 1 year ago

I agree on all points.

Distributed communication is one of those areas I used to know about but did not have to be about. Now, it seems it's all I think about... unfortunately 😂.

Good luck with this. I'll keep watching the progress. Also, well done putting FusionCache together. It was pretty quick to get going in your codebase. I've been a user for a while, so maybe that helped, but I was impressed by how quickly this came together. Kudos.

rickdotnet commented 1 year ago

I was bored and thinking a bit on this. Instead of using a backplane as it's implemented in FusionCache, I think I'd wrap the SqlServer distributed cache and handle evictions at the get/set point; but on a background thread.

When Key1 is grabbed from the cache, a background task goes out and asks for other keys to evict.

The issue then becomes, what happens if the key you're grabbing is one of those that needed to be evicted? Trade-offs come into play. Maybe an in-process messaging system can shout to the roof-tops. Maybe you can afford to poll if you aren't making frequent use of the cache.

jodydonetti commented 1 year ago

Hi @rickdotnet , thanks for your thoughts!

The last couple of months have been quite intense since I've been overwhelmed by an almost complete rewrite on the distributed part of FusionCache: I'm finishing it right now and the results seem too good to be true, even if I say so myself 😅

Anyway, all of this to say sorry for the delay in answering you: I'll try to take a look at this soon and will let you know.

Thanks!

rickdotnet commented 1 year ago

Hey @jodydonetti, no rush for me. I'm just adding bits of thought randomly. I don't intend to push that POC much further than it is. If you do end up landing on a direction for this I could revisit it. I'm still doing a bit with NATS too, so if you end up spending any effort there, that might pique my interest.

Wraith2 commented 8 months ago

Hi @jodydonetti can you expand on what you said above about rebus and nservicebus

I've been able to talk with the creator of Rebus, and he's been kind enough to spend some time with me about this and he thinks this is not a good use case for Rebus, and I should better not go down this path.

Then I had a talk with some people quite expert on NServiceBus & similar solutions, and they all told me it's not really a good use case either.

What are the things that make such busses unsuitable for a backplane? The backplane interface itself is deceptively simple so I'm wondering what the non-code requirement are that are needed.

jodydonetti commented 8 months ago

Hi @Wraith2 , sure thing!

What are the things that make such busses unsuitable for a backplane?

If I remember correctly, that in general they are somewhat overkill for this use and that they are typically designed in such a way that the setup is done not programmatically but via attributes and reflection, whereas in FusionCache for example the name of the channel is known only at runtime via configuration.

The backplane interface itself is deceptively simple

Yes, I tried to design it that way exactly to ease implementation with other systems!

so I'm wondering what the non-code requirement are that are needed.

If you think you've found a way to make this work reasonably I'm all ears 😬

Wraith2 commented 8 months ago

I'm just investigating. I'm very interested overall but the redis requirement is tricky, not impossible but tricky. I would like to be able to use azure service bus or possibly write my own simple message passing system using ZeroMQ pubx/subx. I read through this thread and saw the warning Rebus which would also seem to cover azure service bus as well and thought i'd be better to get more information than dive into something which you've already noted isn't likely to work or scale well.

rickdotnet commented 8 months ago

Hey @Wraith2 , if NATS would suite your use-case, I wrote a POC backplane when I was very new to NATS. I used the older V1 client and have considered redoing it now that I'm more familiar with NATS and use the V2 client instead.

That said, what I've been doing personally is using NATS KV as my distributed cache and using FusionCache on top of it instead. No backplane required in that case since the KV is already distributed.

jodydonetti commented 8 months ago

I wrote a POC backplane when I was very new to NATS.

Uuuh, that seems really interesting... tell me more 😬

That said, what I've been doing personally is using NATS KV as my distributed cache and using FusionCache on top of it instead.

So you created an implementation of IDistributedCache based on NATS? That is also super interesting, but... how? From what I'm reading on Twitter the team is only now working on adding per-key TTL support to NATS:

https://x.com/jodydonetti/status/1768054095020187843

Can you help me understand more?

Btw I think you should talk with them about your experience, they would be interested I guess!

No backplane required in that case since the KV is already distributed.

I don't understand this: the backplane is needed to avoid sync issues with each node's local memory cache (L1), and is useful even when using a distributed cache (L1+L2).

rickdotnet commented 8 months ago

I don't understand this: the backplane is needed to avoid sync issues

My caching ignorance got ahead of me... I wasn't even thinking L1, you're right. I was more focused on the L2 side of that coin where NATS is handling the consistency.

the team is only now working on adding per-key TTL support to NATS

Yeah, I've been having fun with the NATS team 😄. I actually stole a bit of an idea from FusionCache and store the TTL with the serialized cached item. Upon retrieval, I deserialize the cache item and check the current time against the stored expiration. If the item has expired, it is deleted from the store and a null value is returned. Otherwise, the value is returned to the caller.

public async Task<byte[]?> GetAsync(string key, CancellationToken token = default)
{
    var store = await GetStoreAsync(token);

    var entry = await store.GetEntryAsync<byte[]>(key, cancellationToken: token);

    if (entry.Value is null)
        return null;

    var cacheItem = DeserializeFromBytes<CacheItem>(entry.Value);

    if (DateTimeOffset.UtcNow <= cacheItem.Expiration)
        return cacheItem.Value;

    await store.DeleteAsync(key, cancellationToken: token);
    return null;
}

Uuuh, that seems really interesting... tell me more 😬

My first stab at it is up here. This was using their V1 client, but the V2 client is much better to work with, imo.

rickdotnet commented 8 months ago

Also, since we're here talking about non-redis options, Github threw this on my Dashboard: https://github.com/microsoft/garnet

jodydonetti commented 8 months ago

Also, since we're here talking about non-redis options, Github threw this on my Dashboard: https://github.com/microsoft/garnet

Eheh, I know 😬

https://x.com/jodydonetti/status/1770188264202215853

... but also, not today it seems:

https://x.com/jodydonetti/status/1770903586748170415

jodydonetti commented 8 months ago

I wasn't even thinking L1, you're right. I was more focused on the L2 side of that coin where NATS is handling the consistency.

Got it

Yeah, I've been having fun with the NATS team 😄.

Ahah, well it seems you were right after all 😬

I actually stole a bit of an idea from FusionCache and store the TTL with the serialized cached item. Upon retrieval, I deserialize the cache item and check the current time against the stored expiration. If the item has expired, it is deleted from the store and a null value is returned. Otherwise, the value is returned to the caller.

Good call!

My first stab at it is up here. This was using their V1 client, but the V2 client is much better to work with, imo.

When all is done, they'll have released official support for per-key TTL for NATS v2 and the impl of IDistributedCache for NATS will be stable I'll gladly add that to the list of main implementations in the docs page and make some tweets about it!

Glad to help with it or the backplane if needed.

rickdotnet commented 8 months ago

Awesome, this is great. The NATS team is fantastic and very quick to turnaround new features. Their .Net folks are great to work with too. I'll keep watching for updates.

ZiggyCreatures / FusionCache

[FEATURE] Backplane for SQL Server distributed cache #111