dapr / dapr

Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.
https://dapr.io
Apache License 2.0
24.05k stars 1.9k forks source link

[Proposal] New Building Block: Centralized Cache Store #7886

Open WhitWaldo opened 4 months ago

WhitWaldo commented 4 months ago

In what area(s)?

/area runtime

Describe the proposal

This is a working proposal mentioned during a recent conversation on Discord about my proposal for a new and specialized key/value store and how there are a number of providers though to be key/value implementations that wouldn't clear my bar for inclusion because of lack of support for core functionalities. Enter this proposal then that seeks to offer a lower bar for key/value store-like functionality but, as it's another of my specialized store proposals, it features an API specific to a distinct type of key/value store: the centralized cache.

This proposal rethinks the existing state store to imagine a world in which the store API can be free of ill-fated design decisions and from which a fresh start can be enjoyed that favors a concise, simple API that provides a cache capability and nothing more. Should a developer want something more from the API such as a mechanism to do prefix key queries or query the values for some constraint, they're encouraged to turn to specialized stores for that functionality, such as the proposed Key/Value Store and Document Store, respectively.

What's the purpose of a cache?

A cache should temporarily store some data (typically something not too large) alongside a key for some specified amount of time. While there are variations of the concept that distribute the cache across multiple points of failure (introducing the CAP theorem trade-off of limiting consistency in favor of availability) or even just implement a local in-memory cache accessible to perhaps even just a single class, this proposal seeks to implement a cache offering using a centralized provider and not trying to implement the sidecar as some sort of distributed cache provider (though this could just as easily be the focus of another specialized state store proposal).

What does this proposal involve and how does it differ from the key/value store proposal?

We're looking to target an ever-so-slightly different piece of the key/value store pie with this proposal in that it targets those developers who expect to have small chunks of data that expire over time, who know the keys they'll be engaging with and who need increment/decrement support for in-place updates.

As caches are often designed for rapid retrieval, they're often hosted in-memory which means that developers aren't intended to necessarily store enormous values alongside their keys. As such, this proposal will not support streaming operations, but will instead set and return whole values. Perhaps the API should enforce a maximum size for operations so as not to overwhelm the sidecar?

The providers for this proposal will differ from the Key/Value Store proposal as memcached might be a good fit here where it's not there since it doesn't support key prefix queries. But it does support TTL and has increment/decrement support, so it'd be a better fit in this space.

This proposal might support ETag (or at least a consistency value) pending further research into just how many providers support it natively. TBD.

What will this proposal not cover?

We're looking to build a very specialized state building block here that emphasizes temporary reads and occasional writes and updates of small chunks of data that can be manipulated in minute ways, but this will not cover the following supported by today's state management building block:

Interface

The Centralized Cache components should be able to implement all of the following capabilities using asynchronous methods that minimize back and forth operations with the sidecar. While the key should be represented as a string, the value should also be stored as a string. How it does this is an implementation detail left to the SDK, but it should be uniformly done so that values set by a C# client can be read back out by a JavaScript client.

No additional metadata should be collected regarding the various types of data stored or other operations performed based on the type being operated on - this store is simply an abstraction for storing strings as keys and value to and from the provider.

If the user desires more elaborate provider-specific functionality, they should be advised to use the SDK for that provider rather than this abstraction.

The naming in the following tables matches the naming used in the Key/Value store proposal for consistency.

Name Description Notes
TryAdd Adds the specified key and value to the store along with a floating (3 days from now) timespan, returns true if successfully added or false if the key already exists. In other words, this doesn't seek to overwrite an existing key.
Set Adds the specified key/value to the store and updates with the given value even if the key already exists. This one will always overwrite even if the key is already used.
TryGet Attempts to get the value for a given key. Returns true and the value if the result is retrievable and returns false if not.
TryRemove Tries to remove the value with the specified key from the store
TryIncrement Tries to increment a value stored as the string representation of an unsigned int64 by the provided value Will return true if the operation is successful or false if not (e.g. the key does not exist).
TryDecrement Tries to decrement a value stored as the string representation of an unsigned int64 by the provided value Will return true if the operation is successful or false if not (e.g. the key does not exist).

Thoughts about the SDKs

I'm generally going to speak about the .NET SDK here as it's the only one I'm personally familiar with, but I'd urge the same principals to be applied to the other language SDKs as make sense.

I propose that the SDKs take advantage of the clean split to similarly break out specific support for the functionality from a single unified DaprClient with an increasingly dense method space and instead implement a specific DaprCentralizedCachedClient suited specifically for this purpose.

Today, there's a Dapr.Client and Dapr.Client.AspNetCore NuGet package wherein the latter primarily just includes dependency injection registration support with Microsoft.Extensions.DependencyInjection. I would bundle these into a single Dapr.State.CentralizedCache NuGet package that includes the DI extension methods for anyone that wants to use them while also keeping the DaprCentralizedCacheClient public and accessible if not.

In other words, after installing the Dapr.State.CentralizedCache package to their project, they can call builder.Services.AddDaprClient().AddCentralizedCacheStateStore() or builder.Services.AddDaprCentralizedCacheStateStore() to register the DaprKeyValueClient which is a narrow SDK implementation of only those functionalities necessary to facilitate interaction with this new cache state store via Dapr.

1) Makes it easier for developers to call the right methods pertaining to what they're doing (e.g. Intellisense doesn't overwhelm them with every possible action as only the narrow few available to their specific SDK are available). 2) Makes ongoing maintenance and testing easier since the all the code not directly shared between pieces is sectioned off to wholly separate directories (e.g. you don't have a single file with umpteen tests in it since all the methods it calls are in the one file). 3) Speaks to developer concern about bloated installs containing 90% more code than the limited functionality they were seeking to use (I don't know to what degree the Dapr .NET or other SDKs are trimmable/tree-shakable/whatever other languages do) 4) We're already doing this with Workflow, one of the newest building blocks in that in the .NET SDK, it is implemented via its own dependency injection extension and workflow-specific client from the same package.

Final Thoughts

This seeks to provide a narrow and purpose-driven API specific to developers looking to augment their applications with cache support for the standard features they want to abstract across a collection of providers.

Thank you for the consideration!

philliphoff commented 4 months ago

@WhitWaldo I wonder about the notion of not having some sort of "list keys" operation, even if perhaps optional (for those backing stores that truly cannot support it) or well understood to be potentially inefficient and therefore not recommended for use in production. One of the significant limitations of providing tooling for Dapr is the lack of observability of its stores. While I wouldn't want to try to create a "one size fits all" query API, providing no ability to understand what's in a store feels like repeating a past mistake.

WhitWaldo commented 4 months ago

@philliphoff, I had a conversation with @berndverst this last week in Discord regarding my other proposal for a key/value store and my larger approach to more specialized state stores, and they were quite adamant about not supporting optional interfaces any longer. Given the great many specialization opportunities, I'm inclined to agree. And precisely not all stores support some sort of key-list operation, I'm inclined to not make it a part of the proposal.

If developers need some key observability, turn to my proposal at #7338 where I suggest there be key-prefix filtering (beats a full key dump) and precious little else. And in lieu of that, there's no reason why one couldn't build a pluggable component that combines both the cache store and the key/value store and makes their own key index complete with any sort of non-native filtering they want.

Here though, I'm targeting a use case where the key is derived through something else (e.g. shopping cart ID or some time-dependent hash with an identifier) and a lookup isn't really necessarily since the values aren't long-lived enough for them to accumulate much. There's TTL support and tentatively ETag as well as inc/dec single-trip operations, but that's about it.

dapr-bot commented 2 months ago

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

WhitWaldo commented 2 months ago

/assign

dapr-bot commented 1 week ago

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

yaron2 commented 1 week ago

/nostale