Migrating from Grain<SomeState> to Grain<OtherState>

falconmick commented 8 years ago

@sergeybykov I was just wondering if there is any procedures for migrating grains State data?

It so far feels like using State for anything other than highly simple/throw away-able data is dangerous as if it goes to production your kinda stuck to that state model otherwise the serialized can't de-serialize.

Is there a better way other than just avoid State all together and doing the work manually inside of the ActivateAsync?

Cheers, Michael.

ElanHasson commented 8 years ago

Would be interesting if there was a "grain version migration" , similar to entity framework migrations, where you can transform data as the schema (or even provider) changes.

sergeybykov commented 8 years ago

In hindsignt, I think we made the mistake of using the Orleans binary serializer by default in the storage providers we included. That was done for "convenience", but as a result unintentionally and unnecessarily coupled the storage provider extensibility point with the particular implementation option. If we used a different serializer, one that has a built-in support for versioning, e.g. Bond or ProtoBuf, we wouldn't have created such a confusion.

One possible migration workaround I can think of it to take the auto-generated serializers for SomeState and OtherState, and build a custom one that will know how to convert/migrate one type to the other. Once you have that, you could even make an offline tool to convert data in storage.

Eldar1205 commented 8 years ago

I have implemented a decorator for the Azure table storage provider (may work on others, haven't verified) that handles grain state transitioning. The solution comes with the following pieces (if you wish to see full implementation, please let me know how you wish to receive it, too much for this comment):

An interface IGrainStateTransition, inspired by Orleans IGrainState interface, acts as the grain state in-memory, not persisted, and used by the storage provider decorator. API:

public interface IGrainStateTransition
    {
        /// <summary>
        /// Gets or the new grain state
        /// </summary>
        object NewState { get; }

        /// <summary>
        /// Gets the type of the new grain state
        /// </summary>
        Type NewStateType { get; }

        /// <summary>
        /// Gets the type of the old grain state
        /// </summary>
        Type OldStateType { get; }

        /// <summary>
        /// Creates a new instance of the old grain state, its type is <see cref="OldStateType"/>
        /// </summary>
        /// <returns>A new instance of the old grain state</returns>
        object CreateOldStateInstance();

        /// <summary>
        /// Sets the grain's new state to <paramref name="grainNewState"/> that its type should be <see cref="NewStateType"/>
        /// </summary>
        /// <param name="grainNewState">The grain's persistent state, should be of type <see cref="NewStateType"/></param>
        void SetNewState(object grainNewState);

        /// <summary>
        /// Transitions from <paramref name="grainOldState"/> that its type should be <see cref="OldStateType"/> to the grain's new state
        /// </summary>
        /// <param name="grainOldState">The grain's persistent state, should be of type <see cref="OldStateType"/></param>
        void TransitionToNewState(object grainOldState);
    }

A class GrainStateTransition<TOld, TNew> implementing IGrainStateTransition for strong typing of transition from old grain state, TOld, to new grain state, TNew. TOldand TNeware types that represent the persisted grain states.
A base class for state transitioning grains, that is used by the stateful grain as if it's inherited from Grain<TNew>. The base class itself inherits Grain<GrainStateTransition<TOld, TNew>>. The base class hides the state transition aspect from the inherited grain so the inherited grain sees only the new state in the State property of stateful grains, implemented by the following code:

protected new TNew State
        {
            get
            {
                return base.State.NewState;
            }
            set
            {
                base.State.NewState = value;
            }
        }

The storage provider decorator class, GrainStateTransitionStorage, which has special ReadStateAsyncand WriteStateAsync implementations when the grain state is of type implementing IGrainStateTransition: when reading state, first the state is read from storage using the decoratee. The state that was read may be the old state which is converted to the new state so on next write the new state is written, or the new state was read which doesn't need to be converted. When writing state, the grain state isn't persisted as is since it's implementing IGrainStateTransition, instead the content of the NewStateproperty is written to the decoratee.

I also logged every time the decorator read old grain state, to know when there are no longer transitions happening and the grain can resume to become a standard Orleans stateful grain using the new state.

falconmick commented 8 years ago

In hindsignt, I think we made the mistake of using the Orleans binary serializer by default in the storage providers we included. That was done for "convenience", but as a result unintentionally and unnecessarily coupled the storage provider extensibility point with the particular implementation option. If we used a different serializer, one that has a built-in support for versioning, e.g. Bond or ProtoBuf, we wouldn't have created such a confusion.

As the project I am currently working on is purely for fun, it doesn't matter too much that I am having these issues. If I were to move forward with a more serious project, would you recomend not using State or potentially creating a custom State Serialize that can handle migrations?

sergeybykov commented 8 years ago

I would recommend to think about writing a storage provider with more flexible handling of state serialization. If it shapes into a generic solution, that would be potentially a valuable contribution to the codebase.

lucasgodshalk commented 8 years ago

If I can toss in my two cents, I would highly recommend using an existing serialization framework that has known methods to migrate data. This is one of those problems that rolling it yourself has plenty of downsides, and at best reaches the capabilities of already built frameworks. I'd personally recommend protobuf-net if you care about size and speed (if only because bond is newer and less well used, so the knowledge base is smaller). Or Json.net if you want to make it easy on yourself (the advantage having a human-readable view of your state). Both libraries have known migration strategies, and are fairly easy to implement as state providers (I think there might be existing implementations of either one hosted somewhere within orleans/orleans contrib).

ElanHasson commented 8 years ago

I think I agree with @TrexinanF14, let's use what exists if it works.. However, I am not familiar with the migration strategies he mentioned.

I think a scenario to keep in mind is being able to split Grain into Grain<T1..N> . I can imagine over time in real world applications, grainstates have gotten larger than they probably should have and may need to be refactored into multiple grains for various reasons.

I think the implementation should be able to execute arbitrary logic, and therefore other grains to move data around.

ElanHasson commented 8 years ago

GitHub stripped out my angle brackets. First line of second paragraph should read "I think a scenario to keep in mind is being able to split a Grain into multiple Grains."

ElanHasson commented 8 years ago

Referencing #61.

veikkoeeva commented 8 years ago

There is one case of precedent for this in ADO.NET storage provider (the naming etc. isn't the smartest). It can use any (de)serializer and do arbitrary transformations at any granularity. The problem is that it is difficult to configure it currently (in bootstrapper, giving one instantiated explicitly or via management grain) since Orleans creates on internally.

The idea here is simply to have a canonical interface for (de)serializers (example) and put them into a container (example) that the storage provider then calls.

Other features are that the actual grain ID is stored in the DB and that ADO.NET provider can use the volatile assembly information when making operations, but strips them off before saving. This way the version information, for instanc,e doesn't end up to the DB to being compared. There's an issue for that at https://github.com/dotnet/orleans/issues/1998. Oh, and the idea was also to be backwards compatible with the current interface, support streaming and special in-storage capabilities and some attention was paid towards patching bigger blobs too (which I think streaming is a special case of, patching past the end of what has been saved before) and one can change also the (de)serialization formats on the fly (test, needs to be followed to the super-class).

If you look at the code,

talarari commented 7 years ago

Hey, we're facing this same question ourselves right now.

Could not find a best practice on how to handle changing the state type. Is there a recommended strategy?

We thought maybe grain versions can help us with that, but i think we'll still have a problem. Currently i cant think of a way to be able to even add fields to a grains state without potentially loosing it, i'll explain.

Say i have SomeGrain, and SomeState looks like this:

public class SomeState{
    public string Field1;
}

Let's say i have this deployed in production. Now i want to add Field2, so SomeState will look like this:

public class SomeState{
    public string Field1;
    public string Field2;
}

Lets say i deploy the new version of the grain, it get activated, fills Field2 with something and writes its state.

So far all good.

now, since during deployment i will have new silos with the new version of the grain and old silos with the old version of the grain.

if that grain get activated in an old silo and writes its state. i loose field 2. since it doesn't exist in the old version of the state.

This is very worrying for us right now, any suggestions on how to handle this would be very welcome. thanks.

Eldar1205 commented 7 years ago

I implemented a custom storage provider that can read new and old states but writes only new state. This solution includes a conversion function from old state to new such that reading the state always returns the new state and the grain is not aware of the transition. I am confident this approach can be extended to support old grain versions reading old state during a deployment, perhaps by writing new state to a different table

On Oct 3, 2017 20:25, "talarari" notifications@github.com wrote:

Hey, we're facing this same question ourselves right now.

Could not find a best practice on how to handle changing the state type. Is there a recommended strategy?

We thought maybe grain versions can help us with that, but i think we'll still have a problem. Currently i cant think of a way to be able to even add fields to a grains state without potentially loosing it, i'll explain.

Say i have SomeGrain, and SomeState looks like this:

public class SomeState{ public string Field1; }

Let's say i have this deployed in production. Now i want to add Field2, so SomeState will look like this:

public class SomeState{ public string Field1; public string Field2; }

Lets say i deploy the new version of the grain, it get activated, fills Field2 with something and writes its state.

So far all good.

now, since during deployment i will have new silos with the new version of the grain and old silos with the old version of the grain.

if that grain get activated in an old silo and writes its state. i loose field 2. since it doesn't exist in the old version of the state.

This is very worrying for us right now, any suggestions on how to handle this would be very welcome. thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dotnet/orleans/issues/2153#issuecomment-333917892, or mute the thread https://github.com/notifications/unsubscribe-auth/AFBYt9IyoLG9XPLXZQ_TSPvijsCL-X_2ks5som4JgaJpZM4J8Zj- .

veikkoeeva commented 7 years ago

@talarari What underlying storage are you using? One thing that could work is that you transform the state as a storage-side operation.

@Eldar1205 It looks like we've converged to the same approach. This is built in to ADO.NET Storage too, though for the time being there isn't a good mechanism to give a user access to it. If yours is an interceptor, pipes-and-filters, VETRO like an approach, maybe it could b e worth making and official issue and poke a more general approach? The one in ADO.NET allows for changing serialization formats too.

talarari commented 7 years ago

Were currently using blob storage. Im not talking about chaning the state class type. Im talking about changing properties in the exising type. So, I have no problem migrating state to the newer version, just load the old state and save it to the new properties.

The problem is, blob storage flushes the whole state to the blob (replaces it) when writing state. That means if i have an activation of the newer code that has an additional property and saves it. And then i get an actication for the same grain on an older silo which doesnt have that property, it wil read the state while ignoring that extra property - and when it writes its state it will replace the blob contents and i've lost that extra property for good.

So as i see it i need to either make sure no old activation happens ever after a newer version of the grain exist. Or, use a storage provider that updates the stotage instead of replacing it.

I hope i made my problem clearer. Any suggestions are welcome

talarari commented 7 years ago

Getting an old activation can happen when : If youre not using versioned grains, i guess at any time during the deploment while you have some silo already upgraded and some not yet.

Or, if using versioned grains with a staging environment, when you want to roll back an upgrade (because of bugs, deployment failure etc)

Correct me if im wrong

Ranmoro commented 6 years ago

I suggest:

save the version of the Grain scheme.
create a mechanism for migrating the scheme.
if a newer version of the schema comes from the repository, Grain throw a exception, after which he will be created on another Silo with new version.

veikkoeeva commented 6 years ago

Maybe workable would be also to put a transformer in a DI container that's loaded in the pipeline. This transformed can be hand-made and take the target grain and state type and see what's in the source. The programmer of the system likely know the previous version and can write a conversion function. Then when a new version is deployed, it can handle the transformation and the deployment after it can handle the transformations that aren't needed anymore.

Ranmoro commented 6 years ago

In the simplest case, there may be an interface IVersionAdapter for Grian with the Migrate method or the DI container realisation.

The process of doing this should include actual code and access to other Grains, not just changes at the serialization level.

veikkoeeva commented 6 years ago

@Ranmoro It could look a bit like at https://github.com/dotnet/orleans/blob/master/src/AdoNet/Orleans.Persistence.AdoNet/Storage/Provider/StorageSerializationPicker.cs#L70. You see the parameters there. It's a rough sketch and would need refactoring so that with those parameters choose a (de)serializer and then likewise one can choose a custom transformer.

This code predates DI in Orleans and was, indeed, rough sketching towards this kind of a migration (and those few interfaces you see there would need to be custom implemented to do this). Currently one can choose it to switch serialization formats, e.g. there is a test from XML to JSON (if I recall correctly) and in general introduce and load into the process any (de)serializer that is needed.

It would make sense to add this kind of a "data refactoring" idea to the system. It looks to me it's in general an intercetor like a system -- that would do VETO or VETRO pattern -- and could help, for instance, if one wants to pass data in one format through the whole system without transforming from one (de)serialization format to another in between.

dotnet / orleans

Migrating from Grain<SomeState> to Grain<OtherState> #2153