dapr / proposals

Proposals for new features in Dapr
Apache License 2.0
15 stars 33 forks source link

List API for the state store building block #61

Open elena-kolevska opened 3 months ago

elena-kolevska commented 3 months ago

This proposal proposes implementing a List API in Dapr's state component. The List API will enable the retrieval of keys in a state store based on certain criteria, providing users with the necessary visibility into stored keys.

WhitWaldo commented 3 months ago

This proposal is quite similar to a discussion I've been having for the last several months. I can tell you already that I would not expect it's likely the proposal as-is would receive much traction for the following reasons:

Rather, I've been working to propose and solicit traction for a next-generation of specialty state stores in Dapr that each support a core feature set specific to their unique purposes. I just finished some updates to my proposal for a refreshed Key/Value Store that would support streaming and key retrieval based on prefix filtering, @berndverst has an outstanding proposal for a Document Store that would specifically allow document-focused queries and I just typed up one for a dedicated Centralized Cache Store.

While querying keys is out of scope for the cache proposal and I'm not initially inclined to do more than a prefix key search on the key/values store, I'd love to hear your thoughts on why I'm scoping either too narrowly. Or perhaps your approach is better suited to the document store proposal as it would be interacting with query-based APIs that are more amenable to programmatic constraints, but I'd urge you to follow up on any of those linked items.

berndverst commented 3 months ago

While a list operation would be very useful, and I also recognize the desire for list with prefix, I currently do not support this proposal. Here is why:

Memcached is the first example that comes to mind (even before I saw that you mention it in your proposal) that highlights the limitations of this proposal.

I think this proposal can be reduced to a List operation for all keys and easily approved that way. However, I cannot support augmenting our building block interface with yet more methods or parameters that only a subset of components can implement. And as I mentioned the in-memory handling workaround would be problematic.

I agree with @WhitWaldo that having specialty state stores - separate building blocks for these in fact - makes a lot more sense!

ItalyPaleAle commented 3 months ago

I agree with @berndverst and I'm a 👎 on this proposal

I had done some research about listing before, and it's a really hard problem.

Even for backends that do support listing, pagination is implemented very differently, and even then, pagination is not consistent if the underlying state changes between requests.

artursouza commented 4 weeks ago

I agree with @berndverst and I'm a 👎 on this proposal

I had done some research about listing before, and it's a really hard problem.

Even for backends that do support listing, pagination is implemented very differently, and even then, pagination is not consistent if the underlying state changes between requests.

@elena-kolevska did a great research on this problem as well. I agree it is not an easy problem.

Regarding pagination specifically, we can offer a simple API for "next" page only and that will allow a common scenario across state stores. The fact that it is not in a session (items listed can change between page requests) is an acceptable behavior in many applications - yes, not all.

Dapr has never intended to cover 100% of the use cases for applications, it has been about covering what most apps need. So, for any new API proposed, not covering all scenarios has never been a blocker and it will not be different for list API.

artursouza commented 4 weeks ago

This proposal is quite similar to a discussion I've been having for the last several months. I can tell you already that I would not expect it's likely the proposal as-is would receive much traction for the following reasons:

  • It proposes functionality quite similar to the Query API which proposes filtering, sorting and paging on key or value. This API has been discontinued because of the difficulty of maintaining it against the great many providers the state management building block supports.

This is different from Query API in one fundamental way - it does not offer a query language that acts on the state values. This proposal includes filtering by key prefix, which is not even close to the query API. Filtering by key prefix can be valuable for applications that want to show only items that belong to an user and compose the key in a smart way to process those - it is a very common scenario.

  • Too few of the providers supported in the state building block support all that you propose. Coincidentally, I did a quick check earlier tonight against the list just to see which ones support key prefix filtering alone and it's a clear minority of the currently-supported stores, especially when you exclude relational databases.

The current state store API does not expect all state stores to match all of the API, see the "transaction" API, for example. The list API has been a demand for a while and the Query API failed to satisfy that (I can be blamed for that since I was part of that design). The problem of the Query API is that it is difficult for components to implement and it is an all-or-nothing type of deal. This List API, we can allow components to partially implement it (no prefix matching, for example).

Rather, I've been working to propose and solicit traction for a next-generation of specialty state stores in Dapr that each support a core feature set specific to their unique purposes. I just finished some updates to my proposal for a refreshed Key/Value Store that would support streaming and key retrieval based on prefix filtering, @berndverst has an outstanding proposal for a Document Store that would specifically allow document-focused queries and I just typed up one for a dedicated Centralized Cache Store.

I agree that this is the way to go. The current state store building block is trying to do too much and does not do any abstraction well enough (lack of List API makes it not even a KV store). On the other hand, I would suggest to view this proposal as a complement, making the state store a KV/store and components that are non-compliant will be moved to an specialized building block. Given the number of full-time contributors, waiting for all the new building blocks to be implement, plus components and SDKs, is not realistic. This proposal is a small and realistic step in the right direction. If there are enough contributors that can commit to deliver the new building blocks in a timely manner, we can still do the list API and rebrand state store as KV store.

While querying keys is out of scope for the cache proposal and I'm not initially inclined to do more than a prefix key search on the key/values store, I'd love to hear your thoughts on why I'm scoping either too narrowly. Or perhaps your approach is better suited to the document store proposal as it would be interacting with query-based APIs that are more amenable to programmatic constraints, but I'd urge you to follow up on any of those linked items.

I agree that cache should not be in the KV store. Also, KV store should not do query. Document store and relational store should be correct abstractions for query. In that case, applications will pick the best abstraction for their problem. Again, this proposal is making the current state store into a KV store.

Lastly, if we all decide that we should create a new building block for KV store and deprecate the current state store (also a valid path), we can do but that will be a bigger commitment. Repeating myself, I agree with specialized building blocks, while also I agree that the List API in the existing state store is a realistic step given the dev cycles we have currently available for the Dapr project. Time-to-market must be considered.

artursouza commented 4 weeks ago

My summary (time-to-market matters):

  1. Dapr's state store does too much and it should be split into specialized state stores instead: cache, kv store, document store, relational store.
  2. Time-to-market to have those specialized stores is unknown (meaning too far in the future) and the list API is satisfying an immediate need that the Query API failed to do (reasons discussed above).
  3. (1) and (2) are not mutually exclusive.
  4. I am happy to work with contributors that want to deliver any building block from (1).
filintod commented 4 weeks ago

One of the first questions I had when trying to use Dapr the first time is how do I list items? and any new user of Dapr will probably have the same question. And that is kind of expected for CRUD operations, and yeah, CRUD does not contains list in there, but it is just expected, how many REST CRUD APIs do you see without it?

I concur with Artur's assesment. This is an achieavable step forward that does not invalidate or exclude the current concerns and future implementations of the loaftier goal of specialized state stores separation. Almost never moving forward to our goal means doing it in a straight line.

WhitWaldo commented 3 weeks ago

My summary (time-to-market matters):

1. Dapr's state store does too much and it should be split into specialized state stores instead: cache, kv store, document store, relational store.

2. Time-to-market to have those specialized stores is unknown (meaning too far in the future) and the list API is satisfying an immediate need that the Query API failed to do (reasons discussed above).

3. (1) and (2) are not mutually exclusive.

4. I am happy to work with contributors that want to deliver any building block from (1).

@artursouza I'm eager to move forward on (1) myself and have put some thought into it. What's the best path forward to get the ball rolling on it? I intend to type up more formal proposals for each in the coming days. Should I be doing anything else as well?

artursouza commented 3 weeks ago

My summary (time-to-market matters):

1. Dapr's state store does too much and it should be split into specialized state stores instead: cache, kv store, document store, relational store.

2. Time-to-market to have those specialized stores is unknown (meaning too far in the future) and the list API is satisfying an immediate need that the Query API failed to do (reasons discussed above).

3. (1) and (2) are not mutually exclusive.

4. I am happy to work with contributors that want to deliver any building block from (1).

@artursouza I'm eager to move forward on (1) myself and have put some thought into it. What's the best path forward to get the ball rolling on it? I intend to type up more formal proposals for each in the coming days. Should I be doing anything else as well?

That is a great start! You may present them (or one at a time) in our Tuesday calls at 9am PST: https://zoom.us/j/91940016938?pwd=bGNRVmlPK094a0tQZWRlTTJIZUl6UT09

Also, feel free to ping me directly on Discord to remind to review them :) I can setup a separate recurring call for us to work together on those proposals as well, for faster feedback loop.

elena-kolevska commented 1 week ago

As agreed in our contributors meeting yesterday, I removed the sorting capability from the proposal.