dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.07k stars 2.03k forks source link

Grain Identity and GrainReference #1123

Closed gabikliot closed 3 years ago

gabikliot commented 8 years ago

This is a meta issue to discuss the question of grain identity and how it is exposed in various places in the runtime. This was raised by a multiple open issues and PRs on the same subject: #1043, #1068, #1075. The way I see this is the following:

1) Actor (grain) has its own private/internal identity, available to the actor code and any other silo code that behaves on behalf of that actor, such as providers. The identity comprises (or can potentially comprise) of ALL aspects of the actor: its primary key, its implementation class type, its implementation class type code, maybe also in the future the list of all implemented grain interfaces, maybe even attributes like Reentrancy type, ANYTHING about that actor IMPLEMENTATION. Indeed, some parts of that information are available in different contexts of the grain code already, but that it non relevant. Other parts do not have all the information but might need it. For example, I can envision some application library code that runs inside the grain code and needs to be passed all kinds of info about the actor - that is, its full identity. We don't want to tell this library code: pass Grain base to deduce the primary key. Instead, we should have the correctly encapsulated identity class. The more actor meta-data is in this identity class - the better. Maybe we don't even call it Identity. Maybe something like: GrainMetaData. The name itself is less important. We already have a candidate for that class: IGrainIdentity . It was added as part of adding DI. I don't think we used it much until now and the work on it was not done (the work was dropped in the middle when the initial DI was added in https://github.com/dotnet/orleans/pull/443). This GrainMetaData or IGrainIdentity should be available to all providers. maybe via an argument or maybe from ProviderRuntime if we are concerned with back compatibility. But the main point is: I see no reason to limit in any way any information available to Actor IMPLEMENTATION (and providers are part of actor/service implementation) via GrainMetaData.

2) Actor public reference - its public identity. Currently that is a GrainReference. This identity is used for other code (clients of that Actor) to publicly communicate with this actor. Here of course we do not want to expose ALL the info about actor implementation. For example, exposing grain impl. Type class is out of the picture, since it is an internal actor implementation detail and the callers/clients should not have access to that (we do actually break this now I think, since GrainReference has inside Grain Class impl. code type, but whether this is OK or not is not the subject of this discussion). The main point is: what is exposed via GrainReference should definitely be limited in some way only to public info.

The difference between internal and public identity can be metaphorically explained via Person's identity: my internal identity, known just by me and "agents acting privately on my behalf (like my tax advisor or CPA) includes all my names, all my IDs such as SSN, passport number, etc..They are all available to my CPA. On the other hand, my public identity is just my name : Gabi Kliot and my public emails.

What stems from that proposal is that storage providers should NOT be passed GrainReference, since GrainReference is NOT Actors' internal identity. Instead, we should pass it (via argument or via some utility used from Provider Runtime) the GrainMetaData or IGrainIdentity. That would solve the discussion in #1075. We should not be asking the authors of #1075 why do you need that or this piece of info in the storage provider. The runtime should pass ALL the available and relevant info about Actor identity to the provider, to allow any scenario.

EDIT: BTW, I also think going forward we should minimize the public usage of GrainReference and even better - make it internal if possible. As I said multiple times in the past, GrainReference is a "weak, untyped" refeernce to a grain, kind of like "void *". We should move away from that into only strongly typed IFoo grain references. The suggestion laid above (to use GrainMetaData or IGrainIdentity) instead of GrainReference on the Actor implementation side will help us in that direction, as it will minimize usages of GrainReference.

shayhatsor commented 8 years ago

@gabikliot, I agree 100% :+1:

veikkoeeva commented 8 years ago

:+1: Especially the "allow any scenario" part.

sergeybykov commented 8 years ago

I think it's simpler than that and we don't need to introduce new concepts.

Identity sort of by definition is what uniquely identifies an entity. In our case the identity is a combination of grain's type and key. The key is a Guid+string combination with convenience shortcuts like Guid only, long only, string only, etc. Grain type is often implicit. From within a grain class it only needs its key since it obviously knows its own class. This is where GrainIdentity was introduced and why it lacks type information. On the caller side we map grain interfaces to classes mostly implicitly, but the caller can also be explicit about the desired grain class. I don't see any public/private distinction here. It's identity one way or the other,

GrainReference is a compact encapsulated representation of a grain's identity that is easy to pass, persist but also, in its strongly typed form, is a proxy for invoking methods on the target grain.

All other implementation details, such as grain class attributes, is metadata. It has nothing to do with uniquely identifying or addressing a grain.

Various components, such as providers, need identity and often metadata. But the metadata they need may be globally defined, e.g. as an attribute on a grain class, or be provider specific - defined in the provider's own configuration/metadata. In the former case, a provider needs to be passed grain's identity and globally defined metadata. In the latter, just the grain's identity.

The discussion in https://github.com/dotnet/orleans/pull/1075 seems to have converged to the same need - to pass identity and metadata to storage provider with the proposed usage of Type viewed by me and @jason-bragg as a poor and limited substitute for passing metadata.

I think we can easily expose all necessary identity information from GrainReference. Or we can extend GrainIdentity with the missing type info. The missing piece seems to be the metadata, and @jason-bragg proposed to use an open ended dictionary for that. I'm not convinced yet that is the right solution for passing metadata, but it's an option to consider.

jason-bragg commented 8 years ago

But the main point is: I see no reason to limit in any way any information available to Actor IMPLEMENTATION (and providers are part of actor/service implementation) via GrainMetaData .

Providers are integration points for application developers to provide their own implementations of well defined behaviors (like storage). I'm having difficulty understanding how we can discuss what should be made available to them, in general terms, ignoring the behavior they are expected to serve.

We should not be asking the authors of #1075 why do you need that or this piece of info in the storage provider.

We should be defining the provider interface based off of the functionality it is expected to provide. If implementers seek changes to a providers interface to support their scenarios, we should seek to understand their scenarios to identify any shortcomings of the interface. This includes questions about what data they need and why.

gabikliot commented 8 years ago

This includes questions about what data they need and why.

They told us already: they need the grain class Type and generally any metadata associated with this implementation class. I see no problem exposing as much as possible (of course, in the context of relevant metadata, not some totally unrelated data) via a well defined GrainMetaData class. Whether it is provided via a new class GrainMetaData or a static globally defined metadata is a secondary question, which I do not have a very strong opinion about.

veikkoeeva commented 8 years ago

To clarifiy, I'm leaning towards open-ended metadata bag. It could take input from wherever and we could provide a strongly typed wrapper around this Dictionary with some well defined properties that supposed to be always present. Otherwise one could retrieve data with some key that the applicationn or provider vendor expects to find (defined, say, in config as key-value pairs?).

jason-bragg commented 8 years ago

@gabikliot

They told us already

And from asking why, we determined that they actually don't need the type. They are only asking for the type because they can't get the grain id from the grain reference without knowing what type of grain id the grain used. By exploring the need for the type we found out it was needed simply to compensate for a shortcoming in our grain reference. Why matters.

I see no problem exposing as much as possible

It seems we all agree that the storage provider needs metadata. The question seems to be about how to get it. Two approaches that seem to be under discussion:

  1. Pass the provider everything we know (or at least everything we think it will need), and let the provider implementer figure out how to assemble the metadata the provider needs from what we give it.
  2. Framework (somehow? still figuring this out) assembles the metadata the storage provider needs and passes it to the storage provider.

In 1), each storage provider implementation is responsible for deriving it's metadata from the information we pass it, and in doing so, we introduce a dependency on all the passed details. Any changes to these details risks breaking any or all storage providers. In 2), the framework is responsible for assembling the metadata for the providers. While the first pass of this may not be great, we are free to iterate on this without breaking providers.

This is about dependency management. By passing lots of details as well as the responsibility for deriving metadata to the storage provider, we increase dependencies between provider implementations and the framework. By taking on the responsibility of assembling the metadata and passing it to the storage provider, we limit the storage providers dependencies on the framework.

I'm not saying we should deny storage providers information they need, or tell implementers 'you don't need that'. I'm saying that the framework should take on the responsibility of solving the problem of how to allow application developers to specify persistence details in such a way that that information is passed to the storage providers.

gabikliot commented 8 years ago

OK, so basically what you are saying is: in general we agree: we need to pass enough metadata to solve all requirements of the storage provider. My suggestion above is taking a short/simple route of dumping all the info possible and let them figure out. Its easier for us in the short term, but your point is that we will pay long tern, in terms of dependencies and backward compat. So alternatively you want to figure out the minimal set of what needs to be exposed, to try to nail all the "potential" future requirement.

Ok, lets see if you can do that. Theoretically your approach does sound better. If you can do it.

A bit of my skepticism in your success stems from the fact that this is not a first time we have been asked to revisit the question of Identity. I think the class GrainExtensions, with all those getters of the primary key, is in the top 5 classes which were rewritten most times during the years. We/I kept tweaking it more and more, since we never quite succeeded to get all the pieces of identity information out of GrainReference and IAddressable and Grain base like customers asked us, for all the different scenarios

So my point is: Perhaps it is a good opportunity now to revisit that and actually do go via the path that I suggested: internal vs. public identity. Needless to say, I disagree with Sergey on the question that there is only one true identity. I just gave an example above of internal (detailed) Person identity vs. my public (limited) identity.

But lets try your route, sure, as long as we can solve all customer requirements .

jason-bragg commented 8 years ago

try to nail all the "potential" future requirement.

I don't suspect we'll be able to predict what any future storage providers will potentially need. What I'm hoping we can design is something that is extensible enough to allow application developers to specify the metadata storage providers need.

For instance, if a developer wishes to be able to specify what table a grain uses for azure table storage, one can introduce an attribute AzureTableStorageProvider that derives from GrainStateStorageProviderAttribute (see #1075) that contains a TableName property that developers can set and that the azure storage provider will receive in it's metadata. Such extensions would allow application and storage provider developers to support a wide range of scenarios with no framework changes simply by taking advantage of the plumbing put in place by the GrainStateStorageProviderAttribute. We don't have to possess precognition, just provide extensible plumbing for developers to build on.

this is not a first time we have been asked to revisit the question of Identity

Understood, and I do think this is an important discussion. I apologize for derailing from this topic.

gabikliot commented 8 years ago

Jason/Sergey, A simple question: do you plan to provide some access to the grain Type in the storage provider (be it via a direct argument, or part of IGrainIdentity , or part of GrainMetaData, or via some static metadata store)? 1) There is clearly a need that at least one customer asked for it. 2) I can easily see how this will be useful for other customers and other scenarios as well, when someone wants to fully control how data is mapped to a DB. 3) I see absolutely no concerns in terms of backward compatibility or too complicated API.

veikkoeeva commented 8 years ago

It looks like there is history here at play, and I haven't considered all the aspects as deeply as you others seem to have.

Maybe the ideas here can reconciliated if we have a class, let it be called GrainMetaData here that have a named and strongly typed properties. These properties are the things Orleans provides and are carefully thought so as not to introduce unintended dependencies. These are valid concerns, because having such a brittle "see the internals" will lead to grief if removed or altered. Then this class is augmented with a bag (having a function generic constraint, caller needs to known the type), where one retrieves via a known key, it could store user defined information taken from whatever place, such as configuration, DI, and even added and modified during runtime by the application. There can be a way to do it in some storage provider specific way (key value pairs in connection string prefixed with key name?) or just key value pairs defined somewhere. Maybe this way if something doesn't make it to the core, a storage provider, or some other provider or an application specific configuration object, can introduce and alter data how it sees fit. Here there is something Orleans providers and then is freely provided and Orleans offers the plumbing for it.

With regards to the aforementioned listing in #1075 for relational, due to storage level processing and what people usually imagine doing there, it's very useful, almost mandatory, to know the identifier and the type in some predictable way. As a made-up example, say, billing information and the key is an account number and once in a while a stored procedure makes a quick pass over the data to accumulate billing information and loads it as precalculated to some other table. Here could be conformance to some industry standards that require the data to be protected and processed in certain, audited ways. Notable here is that one might want to insert most of the data to a same table and only some data to purpose-built tables.

Maybe there is also a case for dynamic configuration. Say, at certain times one wants to save data at place X, then other times place Y and have this rule configured and the provider then reconciliates this situation. Maybe something like a failover to dump data as file writes to a disk when storage connection fails and reconciliate the situation later by playing the data to storage (this is provider specific).

jason-bragg commented 8 years ago

@veikkoeeva I'm not at all opposed to a strongly typed metadata class given the following:

In general, what you describe makes sense, but I'm less concerned about the actual data type (strongly typed class vs. a property bag vs. ??) than I am about how developers can provide metadata. I'm concerned with the plumbing that allows application developers to specify metadata we didn't envision them needing, more than I am about what information we should (or should not) provide them.

@gabikliot

do you plan to provide some access to the grain Type

Unclear. Providing access to grain type, in my view, does not solve the metadata problem. I am of the opinion that we'll be in a better position to determine what is missing from the storage provider interface once we've solved the metadata problem. My suspicion is that the grain type should not be passed to storage providers, but that is just a suspicion at this point.

Regarding 1) - Customer requested this to compensate for shortcomings in our GrainReference class, not because the grain type was, in itself, important. Regarding 2) - In an absence of a solution regarding how to specify metadata, I agree. Let's solve that problem, then see if this still holds. Regarding 3) - CoreClr compatibility. Minimal interface, we should not include something in the interface simply because there is no reason not to.

gabikliot commented 8 years ago

My suspicion is that the grain type should not be passed to storage providers, but that is just a suspicion at this point.

And I am of the opinion that it should, together with maybe more metadata.

creyke commented 7 years ago

It would be valuable to expose the grain Type for custom storage providers. This, among other things, enables reflecting on custom attributes which could then be used to specify custom properties for storage providers, per grain Type.

Some examples being (per grain type); custom storage paths, custom storage formats, specific indexing requirements etc.

sergeybykov commented 7 years ago

We think that changing existing identity types in 2.0.0 would be too disruptive at this point, and would make upgrade/migration a nightmare for existing users. There are some half-baked thoughts about adding an alternative identity system, for both grains and streams, that could be supported side by side with the current identity scheme. So we are pushing this post 2.0.0 hoping that an approach like that can be incrementally added without making a breaking change. However, details haven't been thought through yet.

Moving this to the Backlog milestone.

ReubenBond commented 3 years ago

We now have a public GrainId type (comprised of a GrainType and a key) as well as a type for representing grain interfaces and metadata about all types in the cluster as well as their attributes (reentrancy, etc). Grain identity is now more separate from references and I think this can be closed now.

The identities also include generic type parameters, if applicable, rather than encoding them separately as a part of either the GrainReference or the message headers

pentp commented 3 years ago

Is the plan now to replace all GrainReference uses in IGrainStorage, IReminderTable, StateStorageBridge, LocalReminderService, etc. with GrainId to solve #6308?

For IGrainStorage it looks like the existing string grainType, GrainReference grainReference parameters should be combined into a single GrainId grainId parameter.

ReubenBond commented 3 years ago

Yes, we should migrate to GrainId as the key for storage, etc, for 4.0. There's also a way to get an untyped grain reference (i.e, a GrainId but no interface or interface version, by calling the non-generic IGrainFactory.GetGrain(GrainId))