dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.04k stars 2.02k forks source link

New query functionality #4232

Open philbe opened 6 years ago

philbe commented 6 years ago

@wentaowu and I are planning to add more query functionality beyond what's in http://github.com/OrleansContrib/Orleans.Indexing . We'd welcome recommendations for high-priority additions, e.g., ones you had to implement in your app due to a lack of built-in Orleans support. We have two questions. First, what types of query expressions are most important? For example:

  1. Select grains in a grain class satisfying a {<, ≤,>, ≥} predicate on a property. E.g., Employee.Salary < 100K. This requires adding B-trees to Orleans.Indexing, which currently supports only hash indexes.

  2. Top-K query over a group of grains in a grain class. E.g., top-10 players in a game, grouped by game type, which is a property of the game grain-type.

  3. Aggregate queries, i.e., count, sum, max, min, average over a collection. E.g., number of devices of each device type, where device type is a property of the device grain-type.

  4. Path queries. E.g., a list of teams that have a member M with M.Languages containing "F#", where Team is a grain-type that has a collection-valued property called Members, and Member is a grain type with a collection-valued property called Languages.

Second, over which set of grains can a query be executed? For example:

  1. Queries over all initialized grains of one or more grain-types, where grains are mapped to persistent storage.

  2. Same as (5), but one or more of the grain types is not persisted.

  3. Queries over active grains, only.

  4. Queries over streams, with a window function. E.g., all devices that reported a failure in the last 5 minutes.

  5. Materialized views. For a given query, maintain its result in the face of updates to the underlying grains. E.g., a count of the number of faulty devices of each device type (which is a grain class).

eisendle commented 6 years ago

Based on my requirements, for the first question my priorities would be:

Regarding 5.: Just to clarify, these grains have been activated at least once and have been persisted, but are not necessarily active? This would be an important feature for me.

Other priority:

These new features sound really exciting and will make Orleans even more powerful.

philbe commented 6 years ago

@eisendle, thank you for the feedback. Your interpretation of (5) is what we intended. It's automatically supported by storage systems that offer a query language. So (5) amounts to adding a query processor on top of a key-value store.

jsteinich commented 6 years ago

Being able to easily implement leaderboards would be a big plus for our uses, which mostly requires 2, but also 1 and 3 for some portions. Compound lookups, i.e. a=x and b=y would also be helpful.

Being able to query the storage layer is essential for some of our use cases. This came down to more than just having a query processor though. We needed to implement a storage provider that didn't store all the data in single column and made use of indexes in the storage system.

We do use some simple forms of materialized views, but they pretty much boil down to maintaining a list of grains that meet a condition.

philbe commented 6 years ago

Thanks, @jsteinich. Your vote for 1-3, coupled with 9, is in line with what we're hearing so far from others (not on this issue thread).

mehmetakbulut commented 6 years ago

Regarding query expressions, 1 and 2 are likely the most important for me. I have to do spatial queries (e.g. get all objects within a radius of a point, get all objects intersected by a line segment...) on data that is modeled by the grains. Right now I persist grain data whenever spatial fields change and then I perform a query on the DB instead. It would be great if I can perform it natively within Orleans though I understand this might be too specific..

For set of grains: 5, 8 and 9 would be the most important in that order. Most of my queries act on all grains ever initialized. Queries over streams and materialized views would assist tremendously as well.

lmagyar commented 6 years ago

I'd like to add to the list some minor extensions:

These 2 combined are useful when you have different identifiers for the same business process/grain on multiple external/partner API-s. On a callback from the external/partner API, the client can address the grain directly this way, without direct DB access or without a bloating "id translator grain".

soulson commented 6 years ago

For my use case, the order of importance among query expressions would be 1 > 3 > 2 > 4.

Regarding the query domain, it seems fundamental to allow a query over the set of all-grains-ever-initialized, since all other domains can be effectively queried by filtering on that set. However, based on practical uses that come to mind, querying on the set of active-grains-only would be my most frequently used case, assuming a performance benefit over querying the set of all-grains-ever-initialized.

How goes the port of this project to Orleans 2.0? I'm looking forward to seeing it!

vicosoft4real commented 5 years ago

@philbe Thanks for this new query functionality project. For my case, the type of query that are most important are: 2>1>3. for the second question 5 is the most priority.

vella-nicholas commented 5 years ago

This is not a feature we will see anytime soon, correct?

sjbthfc2 commented 4 years ago

Is the indexing feature still planned to be released at some point in the future?

sergeybykov commented 4 years ago

No plans currently.