dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.11k stars 2.04k forks source link

Grain interface versioned but still going to old version during rolling upgrade #8439

Open cdemi opened 1 year ago

cdemi commented 1 year ago

During a rolling upgrade, when I introduce a new version of a grain interface with a new method that includes a new type, the calls from the newly deployed silos are trying to use grains activated on old silos with the old version, instead of deactivating them and activating them on suitable and supported silos. This is happening despite using StrictVersionCompatible as the compatibility strategy and LatestVersion as the version selector strategy.

.Configure<GrainVersioningOptions>(options =>
{
    options.DefaultCompatibilityStrategy = nameof(StrictVersionCompatible);
    options.DefaultVersionSelectorStrategy = nameof(LatestVersion);
})

Original Interface:

[Version(1)]
public interface ISampleGrain: IGrainWithGuidKey
{
    Task MyMethod(MyObject request);
}

New Interface:

[Version(2)]
public interface ISampleGrain: IGrainWithGuidKey
{
    Task MyMethod(MyObject request);
    Task MyMethodV2(MyObjectV2 request);
}

Expected Behavior

The new silos should only call new silos with the new version of the grain interface. The StrictVersionCompatible strategy should prevent calls to the old version of the grain interface, and the LatestVersion selector should prefer the latest available version.

Actual Behavior

The new silos are attempting to create activations on grains in old silos, leading to these logs on the old silos:

Named type "MyObjectV2" is invalid: Type string "MyObjectV2" cannot be resolved.

Obviously the old silos cannot resolve MyObjectV2 because they are still running an old version of the code during rolling update.

It is my understanding, according to the docs, that this should not happen and current activations of that interface need to be deactivated and activated only on suitable and supported silos.

Environment

Orleans version: 3.6.2 .NET version: 7

cdemi commented 1 year ago

I have continued to investigate this and it looks like body deserialization is done before compatibility and version checking when received by the silo.

If I am understanding the code correctly, it looks like it's not possible to introduce new types into the cluster even if you use Grain Interface Versioning

cdemi commented 1 year ago

This is the stack trace:

Named type "MyObjectV2" is invalid: Type string "MyObjectV2" cannot be resolved.   at Orleans.Serialization.BinaryTokenStreamReaderExtensinons.ReadSpecifiedTypeHeader[TReader](TReader this, SerializationManager serializationManager) in /_/src/Orleans.Core/Serialization/BinaryTokenStreamReader.cs:line 431
   at Orleans.Serialization.SerializationManager.DeserializeInner[TContext,TReader](SerializationManager sm, Type expected, TContext context, TReader reader) in /_/src/Orleans.Core/Serialization/SerializationManager.cs:line 1362
   at Orleans.Serialization.BuiltInTypes.DeserializeInvokeMethodRequest(Type expected, IDeserializationContext context) in /_/src/Orleans.Core/Serialization/BuiltInTypes.cs:line 2104
   at Orleans.Serialization.SerializationManager.DeserializeInner[TContext,TReader](SerializationManager sm, Type expected, TContext context, TReader reader) in /_/src/Orleans.Core/Serialization/SerializationManager.cs:line 1362
   at Orleans.Runtime.Messaging.MessageSerializer.OrleansSerializer`1.Deserialize(ReadOnlySequence`1 input, T& value) in /_/src/Orleans.Core/Messaging/MessageSerializer.cs:line 157
   at Orleans.Runtime.Messaging.MessageSerializer.TryRead(ReadOnlySequence`1& input, Message& message) in /_/src/Orleans.Core/Messaging/MessageSerializer.cs:line 85
   at Orleans.Runtime.Messaging.Connection.ProcessIncoming() in /_/src/Orleans.Core/Networking/Connection.cs:line 397
jkonecki commented 7 months ago

I'm running in a similar issue in Orleans 8 during rolling upgrade: old grain version is being called by the new client which results in the following exception:

System.TypeLoadException: 'Unable to resolve type alias "("inv",[Orleans.Runtime.GrainReference],[RollingUpgrade.Interfaces.IChat,RollingUpgrade.Interfaces],"ReceiveMessageV2")".'

Repro solution: RollingUpgrade.zip

I tested the following scenarios:

Client V1 uses grain V1 - correct

  1. Start Silo V1
  2. Start Client V1 Client V1 successfully calls grain using interface V1.

Client V2 uses grain V2 - correct

  1. Start Silo V2
  2. Start Client V2 Client V2 successfully calls grain using interface V2.

Client V1 uses grain V1 in silo V2 - correct

  1. Start Silo V1
  2. Start Client V1
  3. Start Silo V2
  4. Stop Silo V1 Client V1 successfully calls grain in silo V2 after it is transferred from silo V1.

Client V2 uses grain V2 in silo V2 - incorrect

  1. Start Silo V1
  2. Start Silo V2
  3. Start Client V2 TypeLoadException is thrown as Orleans desides to activate grain in silo V1. Expected for the grain to be activated in silo V2

Client V2 uses existing grain V2 in silo V2 - correct

  1. Start Silo V1
  2. Start Client V1
  3. Start Silo V2
  4. Stop Silo V1
  5. Start Silo V1
  6. Start Client V2 Client successfully calls grain in silo V2 after it is transferred from silo V1.

@ReubenBond I haven't checked Orleans source code with regards to the comment from @cdemi about payload deserialization being performed before compatibility and version checking. It would seem to be a major flaw that prohibits rolling deployments. I'm happy to assist with investigations / fixing as this issue is impacting one of my client's projects.