dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.04k stars 2.02k forks source link

Support Complex Types as grain Keys #2320

Closed galvesribeiro closed 7 years ago

galvesribeiro commented 7 years ago

Some time ago I have a requirement for a grain type where its key would not fit on the current supported primitive types (Guid, string, long) even if I use the compound keys and since Orleans doesn't support complex types as keys I come up with this workaround:

1 - Create a struct that hold the properties for the complex key:

    public struct AreaKey
    {
        public long Latitude;
        public long Longitude;

        /// Serialize Key
        public override string ToString()
        {
            return $"{Latitude}|{Longitude}";
        }

        /// Parse key from string (ignore lack of fail checks)
        public static AreaKey FromString(string keyString)
        {
            var props = keyString.Split(new[] { '|' }, StringSplitOptions.RemoveEmptyEntries);
            var key = new AreaKey();
            key.Latitude = long.Parse(props[0]);
            key.Longitude = long.Parse(props[1]);
            return key;
        }
    }

2 - Invoke GetGrain() using that key:

var myAreaGrain = GrainFactory.GetGrain<IAreaGrain>(areaKey.ToString());

That approach works, but it is a lot complicated and introduces some concerns for the developers like for instance care on implement the (de)serialization of the key, etc.

I fall in this scenario a while ago, saw people asking on gitter months ago, and now @DixonDs came up Today asking about it again.

Its about time to we have support to it and I believe it is not a big deal to implement at all.

I see it can be implemented in two ways:

  1. Introduce a IGrainWithComplexKey<K> where K should implement IComplexGrainKey<K>. It enforce this key type to have a Deserialize(string keyString) and Serialize() methods to deal with the keys. It is the easiest way to do it. The downside is that for each key type, you have to implement the (de)serialization logic and it will tie the class to Orleans library (i.e. being a non-POCO class because it implement the required interface).
  2. Introduce just IGrainWithComplexKey<K> and make codegen to generate serialization code for it and leverage Orleans' Provider model to add/register Key Serializers globally or per-type. That would remove the (de)serialization logic from the developer and the keys can be pure POCO classes/structs. It will introduces a bit more of changes since we have to enable the configuration of the Key Serializers either globally or per-type basis but that should not be a big deal give that we already do that for other types of serialization inside Orleans.

So, I want to grab feedback from the community about this feature, if that makes sense or not, what approach is better to implement it, or whatever input is helpful.

Its not a top-priority feature (.Net Core port still the major effort) but since in either one of the suggested approaches the changes are very minimal, totally optional and easy to implement and test, it is a quick win.

Thanks

DixonDs commented 7 years ago

Ideally but not sure if feasible, it would be great if we don't have to implement deserialization logic since it might be error-prone in some cases. Probably the complex key object could be stored separately somehow and mapped to the serialized string key.

If we don't need deserialization, we could just use ToString of the passed complex key object so that it works for a number of types for free and custom classes don't need to implement any special interfaces.

ashkan-saeedi-mazdeh commented 7 years ago

I think as long as it doesn't introduce a performance penalty for everyone regardless of their usage or lack there of complex keys, it is ok.

I have a scenario which could benefit from this but was solvable with compound keys. I wanted to support multiple tenants and had to have a customer key and specially it can get complex if you want to have both customer key and grain key with the same type (string/guid) then you have to construct the key yourself. The key construction logic should be abstracted in a utility class and I could use my own struct as key if the proposed feature existed but I prefer to have minimal allocations for keys of course since grain activations/referencing should be as light-weight as possible so additional method calls are not that welcoming and this is kind of a hot path (I guesstimate) so we should be careful to add it in a pay for play model if we are going to do that.

galvesribeiro commented 7 years ago

That is the point. Whoever needs a complex key today must create its own code to encode/decode the key into a string.

I think serialization is not the correct term to use... We need a IComplexGrainKeyEncoder class to be registered and all it does is encode the key. The only way to decode the key back to an struct inside a grain, is to call GetPrimaryKey().

@ashkan-saeedi-mazdeh This feature suggestion is something that don't harm anyone. If Today you need to have more elaborated keys, you have to encode it on a string. All it is doing is abstracting it and make it more "native" in Orleans.

ashkan-saeedi-mazdeh commented 7 years ago

@galvesribeiro Then I like it. I read something on gitter or in some other way I got the impression that additional overhead of a method call or something will be added to all grains. And yes I totally see the need and totally like the abstraction.

Even the case that multiple implementaitons exist for a grain Interface is troubling me. Specifying a string type even as a constant a bit smells and I abstracted that away as well. But that is unrelated and I've no way of solving it in the runtime (maybe with a second generic parameter actually but that will have a more costly implementation in GetGrain).

sergeybykov commented 7 years ago

Let me push back on this a bit.

If the end result under the covers is passing a string to GetGrain<T>(string key), then this is already achievable via IGrainWithStringKey and ToString() without any modification of the Orleans codebase. Then the only real hurdle is converting the string key back to its whatever original form from within the grain code. Is this correct?

If that's the case, I'm not convinced Orleans needs to be in the general business of converting arbitrary types to/from string. Just like a URI can encode various hierarchies and addressing conventions, it is not the business of a web server to help with the reverse conversion of URIs.

galvesribeiro commented 7 years ago

@sergeybykov Using ToString() is the workaround people (including myself) do today. What they want is to pass (following the workaround exampel) an instance of AreaKey to GetGrain<T>(myAreaKeyObj) and inside a grain code, whenever one want to get the key it would be AreaKey key = this.GetPrimaryKey() so they have the strong-typed key back inside the grain code instead of having a string that need be parsed again.

I would be even further and instead of call GetPrimaryKey() we would have something like the Grain State (this.State.MyStateProperty where State is of TGrainState) for the key like this.PrimaryKey.Longitude where PrimaryKey is of type AreaKey.

Make sense?

sergeybykov commented 7 years ago

It does.

Although, when I look at .NET. in general object has ToString() but no Parse(string) method. But Enum has Parse(Type, string).

We've been hesitant or even more than hesitant (big part of that was my position, to be fair) to try to solve some general purpose issues. This one looks to me just that - going long ways to help the developer avoid implementing MyType.Parse(string). I'm not convinced this goal is worth it to complicate the framework even more. String keys are easy to explain and easy to understand how to use them. Something beyond that shouldn't be a concern of the framework in this case IMO.

galvesribeiro commented 7 years ago

Ok. Again, strings work for me, was just something I've seen people asking frequently so I opened the issue.

If nobody has a better argument that would leave us to say that it worth, I'll be closing the issue.

ReubenBond commented 7 years ago

We could use the serialization infrastructure to convert between T <-> string. It's not necessarily the greatest approach, just a thought.

galvesribeiro commented 7 years ago

@ReubenBond that was my suggested option 2

DixonDs commented 7 years ago

So no chances to use Orleans infrastructure to avoid the deseriaization from the grain string key to the original complex object key? Something like passing the original key to the grain instance during the activation and make it accessible from that activation?

I would like to avoid the deserialization for a number of reasons:

  1. It might be tricky to implement. For instance, the deserialization logic mentioned by @galvesribeiro would have a problem if any of key properties had | itself
  2. Some storage providers have limitations on the grain key length namely Table Storage provider. Serialization can lead to long keys but if there were no need for deserialization, we could use some kind of hashing instead of serialization as long it provides unique keys for specific domain of original key values.
  3. Consider the scenario when a grain with a complex key is created from a grain client. That means that serialization of the key will happen in the grain client and deserialization will happen in the grain implementation itself. That means the client logic is coupled to the grain implementation logic as any change in the serialization logic would require a change in the deserialization logic. I guess that is something we would want to avoid and have the client logic independent from the grain implementation logic.

The current approach I use is similar to the one mentioned by @galvesribeiro in the first comment and is clearly a workaround. And well, using workarounds means a lack of proper solutions usually. I will appreciate a lot if anyone suggests the proper solution for this scenario.

ashkan-saeedi-mazdeh commented 7 years ago

I think @sergeybykov is fully write. Orleans as a framework just has to support a certain number of key types. It is similar to the way that key-value stores, RDBMS and any other system which needs a key does. Even Dictionary<T,U> does not allow you to use arbitrary keys without you ensuring uniqueness or providing comparison implementation for sorting/searching/...

Now Orleans supports 3 different key types and some compound versions, it is limited by storage (ATS has limitations, my CouchBase has 250 bytes limitation and ...). It's just IMHO but I'm all for a lean and lightweight framework.

If we want to implement anything tricky in the framework then I would say there is a lot of more tricky stuff like restarting strategies and ... which alikes of Erlang implement in libraries like OTP so I would propose instead of implementing this into Orleans, one can create a library called Orleans.ComplexKeySupport and implement extensions methods and achieve this. I understand if it involves code-gen it will be complex to implement outside of Orleans and still leverage the current serialization framework for Orleans but that's the issue because Orleans serialization is tightly coupled to the main codebase and not because this should not be a library instead of a part of the main framework.

DixonDs commented 7 years ago

@ashkan-saeedi-mazdeh It is funny, I wanted to mention Dictionary as well but in support of this request, because Dictionary allows you to use arbitrary keys (whether hashes are unique or not for Dictionary keys doesn't affect correctness of Dictionary, only performance).

ashkan-saeedi-mazdeh commented 7 years ago

@DixonDs Actually Orleans is more like the RDBMS/Key-Value store case in regard of supporting key types. An in Memory Map of keys to values of course has support of all key types as main functionality of itself but I meant that even in that case, it gives up on say code generation for creating good comparison operators or anything like that and I guess not having unique keys actually affects correction of your own program if nothing else. Regardless of performance/security/even maintainability the most important attribute of any software is that it HAS TO be correct in the first place.

I meant the dictionary to support the case that even if key-value pairs are your main business, you can not free up the end-user from writing any key specific code. For a key to be useful, you should always have a good GetHashCode implementation and also a good implementation of comparison operations. and here we are in the business of producing a highly efficient framework for distributed software which scales and satisfy correctness, performance and functionality cases at the same time and have the somehow conflicting goal of being maintainable in open-source and remain a light-weight framework with a relatively small code base. Orleans's small code base is already conceptually complex enough I would say :)

Again, I'm all for having this but as a separate library which has an interface for complex keys and can serialize a poco to a string and deserializes the poco back from the string using binary serialization or any other method which works. Also I guess most fast key-value stores or databases have limitations on key ranges so you'll never be able to store your whole data in a key. Many key operations like hashing and ... depend on key size and After all this is something which databases might store in in-memory hash tables or BTrees or whatever and can not become big without loss of performance/cache locality. If you have a key size of 250 bytes then in a 64k cache line, you can store more than 100 of them. If each kye is 2k probably you can store less than 15 of them.

galvesribeiro commented 7 years ago

Consider the scenario when a grain with a complex key is created from a grain client. That means that serialization of the key will happen in the grain client and deserialization will happen in the grain implementation itself. That means the client logic is coupled to the grain implementation logic as any change in the serialization logic would require a change in the deserialization logic. I guess that is something we would want to avoid and have the client logic independent from the grain implementation logic.

Good argument btw

veikkoeeva commented 7 years ago

I hope not too tangential notes, but it looks to me that mapping needs to be derived from the input key data and optionally from global constants and also as there are potentially millions of various keys, it may not make sense to store them in memory but to recalculate as needed and perhaps cache as per the grain that needs them. I have used just some function that has mapped the input to a v5 UUID, as that can be represented as a Guid. It is a bit redudant, but the good side probably everything in the world can be mapped to everything via UUIDs (i.e. even something like ISBNs to OIDs, just made that up, but there's likely a way). :)

ashkan-saeedi-mazdeh commented 7 years ago

@galvesribeiro The thing is that I guess for now the coupling between client and grain implementation is high anyways. The only way to decouple that really to me is to have a protocol which completely decouples client from grains. Even if we make the serialization/deserialization of complex keys a part of Orleans, Still a client update after changing the key is required and old clients can not connect and use the grain with new key so coupling still exists deploymentwise and you only don't have to change the client code and just have to rebuild and deploy. This is still coupling to me.

Also the protocol to connect clients to grains should be standard and don't change. Protocol contains external interface of the grain including the key type and format and if you change them, it is an upgrade and hot upgrades are not possible in Orleans and clients need to rebuilt and redeployed. I guess it is true even if we use runtime codegen atm right?

sergeybykov commented 7 years ago

I like @ashkan-saeedi-mazdeh's idea of making a separate library that would be easy to use, but only if somebody really needs this. We could point to the library from the docs. But it would be clear that the key to/from string operations are concerns external to the Orleans codebase. Just like correct implementation of GetHashCode and IComparable is something that a Dictionary class isn't concerned with.

DixonDs commented 7 years ago

@sergeybykov Yes, that's the thing - Dictionary allows to use arbitrary keys and leaves you to implement GetHashCode and Equals (it doesn't require IComparable btw). I would be more than happy if Orleans allowed arbitrary grain keys and used GetHashCode from them for internal use but still giving access to the original key - just like Dictionary does.

ashkan-saeedi-mazdeh commented 7 years ago

@DixonDs There is a difference here. I was wrong to mention the dictionary as an analogy. Orleans can be compared to a DBMS (KVS/RDBMS) in regard to keys. A dictionary is only concerned with simple storage of the keys so has a lot of flexibility in that regard and it will not go in length to prevent you from creating issues for yourself. Also it is the dictionary's functionality to map arbitrary keys and values to each other and if not this, so what it is providing. Orleans on the other hand is neither a distributed map nor a key-value store in principle. Even if it was, still most of them don't support this and in fact they have strict requirements on keys.

Orleans should be able to know the nature of the keys, store them for addressing/indexing/... should move them on the network and be able to serialize/deserialize them and some other stuff probably (for transactions and other features soon) and do it in in a high performance manner. If each grain access would require calling an arbitrary function with unknown performance and correctness characteristics, first this should not be the concern of Orleans and secondly it would be hard to reason about Orleans performance and troubleshoot issues.

As long as it is not a built-in feature, Orleans is not in the business of key mapping/conversion and any issues in that regard should not be resolved in Orleans. Other systems like Akka and Erlang are even less flexible regarding actor IDs. You not only want the key to be complex , also want Orleans to handle all complexities of serialization to a format suitable for storing in a storage system with poco objects of arbitrary type and size. I don't see why it is good to spend time on this in Orleans while a thin library can easily do this all for you in an application specific or general manner.

DixonDs commented 7 years ago

Orleans can be compared to a DBMS (KVS/RDBMS) in regard to keys.

Well, every known RDBMS to me allows composite keys.

If each grain access would require calling an arbitrary function with unknown performance and correctness characteristics, first this should not be the concern of Orleans and secondly it would be hard to reason about Orleans performance and troubleshoot issues.

And then again, think about Dictionary. You can say the same about GetHashCode, it could be arbitrary code there. But why do you care about it when you reason about performance? You assume it is simple enough to be considered neglected for performance considerations, otherwise it is not a problem of Olreans framework itself.

I think we are going in circles in this discussion. We keep talking about deserialization, while I would like to avoid it altogether (as I mentioned in https://github.com/dotnet/orleans/issues/2320#issuecomment-255515103). I believe if that is possible, it is only possible in Orleans code and not outside it.

ashkan-saeedi-mazdeh commented 7 years ago

Ok let's not continue the circle then but it is possible to fully serialize this outside Orleans codebase however obviously a bit of Orleans serialization code could be used for that which can not be anymore so it is not the only way possible.

sergeybykov commented 7 years ago

I'm closing this in favor of an external helper library solution.

DixonDs commented 7 years ago

@sergeybykov Not sure, if you read all comments, but again I think a proper solution would avoid deserialization altogether and could be done only in Orleans core.

sergeybykov commented 7 years ago

@DixonDs I read the comments, and I see your point. I just disagree it is worth complicating the Orleans codebase for that reason.

After all, string keys are compatible with URIs, which is the most popular addressing scheme out there these days, and it works well despite the need to parse. People even argued that Orleans should drop support for Guid and integer keys, and only use strings, though we decided not to go that radical. IMO adding support for custom keys would push us in the opposite direction, farther from the goal of simplicity.