dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.07k stars 2.03k forks source link

Serialization sometimes throws exception during unit tests with TestingSiloHost #1536

Closed bwanner closed 8 years ago

bwanner commented 8 years ago

Hi,

During a unit test with the TestingSiloHost the grain silo sometimes throws an exception when serializing the answer message. Even when the testcases are identical it sometimes throws and sometimes works following no (for me) reasonable pattern. Seems like a bug to me.

I created an repro solution consisting out of four projects which can be obtained HERE

    public interface IGenericGrain<T> : IGrainWithGuidKey
    {
        Task<GenericWrapper<T>> Get();
    }

    /// <summary>
    /// Grain implementation class Grain1.
    /// </summary>
    public class GenericGrain<T> : Grain, IGenericGrain<T>
    {
        public Task<GenericWrapper<T>> Get()
        {
            return Task.FromResult(new GenericWrapper<T>());
        }
    }
namespace ClassLibrary
{
    [Serializable]
    public class CustomObject
    {
        public int Value { get; set; }
    }
}
        [TestMethod]
        public async Task TestMethod1()
        {
            var grain = GrainFactory.GetGrain<IGenericGrain<CustomObject>>(Guid.NewGuid());
            var x = await grain.Get();
        }

Executing these tests sometimes raises the following exception in the primary silo:

[2016-03-08 13:26:24.315 GMT 48 ERROR 101030 Message 127.0.0.1:22222] !!!!!!!!!! Exception deserializing message body Exc level 0: System.TypeAccessException: Named type "Grains1.GenericWrapper1" is invalid: Type string "ClassLibrary.CustomObject" cannot be resolved. at Orleans.Serialization.BinaryTokenStreamReader.ReadSpecifiedTypeHeader() at Orleans.Serialization.SerializationManager.DeserializeInner(Type expected, BinaryTokenStreamReader stream) at Orleans.Serialization.BuiltInTypes.DeserializeOrleansResponse(Type expected, BinaryTokenStreamReader stream) at Orleans.Serialization.SerializationManager.DeserializeInner(Type expected, BinaryTokenStreamReader stream) at Orleans.Serialization.SerializationManager.Deserialize(Type t, BinaryTokenStreamReader stream) at Orleans.Runtime.Message.DeserializeBody(List1 bytes) [2016-03-08 13:26:24.319 GMT 48 ERROR 101017 Runtime.Messaging.IncomingMessageAcceptor 127.0.0.1:22222] !!!!!!!!!! Exception trying to process 325 bytes from endpoint 127.0.0.1:57973 Exc level 0: System.TypeAccessException: Named type "Grains1.GenericWrapper1" is invalid: Type string "ClassLibrary.CustomObject" cannot be resolved. at Orleans.Serialization.BinaryTokenStreamReader.ReadSpecifiedTypeHeader() at Orleans.Serialization.SerializationManager.DeserializeInner(Type expected, BinaryTokenStreamReader stream) at Orleans.Serialization.BuiltInTypes.DeserializeOrleansResponse(Type expected, BinaryTokenStreamReader stream) at Orleans.Serialization.SerializationManager.DeserializeInner(Type expected, BinaryTokenStreamReader stream) at Orleans.Serialization.SerializationManager.Deserialize(Type t, BinaryTokenStreamReader stream) at Orleans.Runtime.Message.DeserializeBody(List1 bytes) at Orleans.Runtime.Message..ctor(List1 header, List1 body, Boolean deserializeBody) at Orleans.Runtime.IncomingMessageBuffer.TryDecodeMessage(Message& msg) at Orleans.Runtime.Messaging.IncomingMessageAcceptor.ReceiveCallbackContext.ProcessReceivedBuffer(Int32 bytes) [2016-03-08 13:26:24.319 GMT 48 ERROR 101027 Runtime.Messaging.IncomingMessageAcceptor 127.0.0.1:22222] !!!!!!!!!! ProcessReceivedBuffer exception with RemoteEndPoint 127.0.0.1:57973: Exc level 0: System.TypeAccessException: Named type "Grains1.GenericWrapper1" is invalid: Type string "ClassLibrary.CustomObject" cannot be resolved. at Orleans.Serialization.BinaryTokenStreamReader.ReadSpecifiedTypeHeader() at Orleans.Serialization.SerializationManager.DeserializeInner(Type expected, BinaryTokenStreamReader stream) at Orleans.Serialization.BuiltInTypes.DeserializeOrleansResponse(Type expected, BinaryTokenStreamReader stream) at Orleans.Serialization.SerializationManager.DeserializeInner(Type expected, BinaryTokenStreamReader stream) at Orleans.Serialization.SerializationManager.Deserialize(Type t, BinaryTokenStreamReader stream) at Orleans.Runtime.Message.DeserializeBody(List1 bytes) at Orleans.Runtime.Message..ctor(List1 header, List1 body, Boolean deserializeBody) at Orleans.Runtime.IncomingMessageBuffer.TryDecodeMessage(Message& msg) at Orleans.Runtime.Messaging.IncomingMessageAcceptor.ReceiveCallbackContext.ProcessReceivedBuffer(Int32 bytes) at Orleans.Runtime.Messaging.IncomingMessageAcceptor.ReceiveCallback(IAsyncResult result) [2016-03-08 13:26:54.366 GMT 54 INFO 101302 Orleans.Messaging.Gateway 127.0.0.1:22222] Recorded closed socket from endpoint 127.0.0.1:57976, client ID *cli/debdb59a.

Any idea what the error is here? CustomObject should be known to the hosting silo because the dll is deployed.

sergeybykov commented 8 years ago

This is with Orleans 1.1.0, right? Can you try the same with NuGets built from https://github.com/dotnet/orleans/tree/1.1.3 and master?

bwanner commented 8 years ago

Checked out the latest master and built the nugets by myself - it's still the same issue. Pushed that to the repository.

Btw: Did something change about the usage of TestingSiloHost? Right now only the first test method of each class is executed and the others fail with a NullReferenceException to GrainFactory.

So no TestMethod1 sometimes fails and sometimes succeeds. Cleaning-up / triggering a complete rebuild seems to reset the behavior to failing.

EDIT: Same for 1.1.3

jdom commented 8 years ago

Hmm, that's very strange. Can you try calling SerializationManager.InitializeForTesting() in a static method decorated with [ClassInitialiaze]. We did see some weirdness in the past when not calling that. If we can get a small repro of that, such as yours, we can hopefully fix it so that the workaround is no longer needed.

bwanner commented 8 years ago

@jdom added

        [ClassInitialize]
        public static void ClassInitialize(TestContext param)
        {
            SerializationManager.InitializeForTesting();
        }

to the unit test class. However, the error message is still the same.

Also, do you have any idea why GrainFactory becomes null for all test methods executed after the first one?

jdom commented 8 years ago

Sorry for the delay, I didn't see the response before. I took a look at the repro, and there's a few issues with it. Regarding the exception in particular, it is because even if the silo host references the assembly that owns CustomObject, Orleans didn't create any serializers for it, because the grains are not explicitly using it. Add [assembly: KnownAssembly(typeof(CustomObject))] to the unit test project (and host) to solve the issue. Or you can also add a reference from GrainProject to ClassLibrary and add that attribute there, so that it creates serializers at build time.

The other issue with TestingSiloHost, is that we are moving to an approach where inheriting from it is discouraged (although it's interesting that it broke your existing tests, so I will take a closer look afterwards). It is preferred that if your intention is to keep the same cluster that is running from test to test, you have infrastructure to support it, such as using statics, or in the case of xUnit, using fixtures that transcend the lifetime of a single test method run. Have in mind that most testing frameworks, including MSTest, create a new instance of the test class for each test they run, so inheriting from something that on its constructor creates a bunch of silos is really unwieldy.

bwanner commented 8 years ago

Thanks for your reply, I added the line you mentioned to the UnitTestProject. However, it did not have any noticeable effect (Commit). The first test run after rebuild still fails, then it starts succeeding. Adding a reference from GrainProject to ClassLibrary is not possible, as GrainProject serves as a library and needs to be able to handle arbitrary generic types. I cannot understand why it does not work in the beginning and then starts working. At no time during test there is an instance of CustomObject to be serialized. All of CustomObject used in the test is the type information for the GenericWrapper. Do you have any idea what is happening here?

The point with TestingSiloHost is very valid, thanks for explaining. Maybe we should update the docs once moved to this new approach or at least make sure this deprecated way of using it is still allowed.

bwanner commented 8 years ago

Updated the repro project to 1.1.3, now we get the same errors as with 1.1.0: Test runs fail occasionally and it doesn't look like it's just an issue of "warming up". The new repro project can is published at https://github.com/bwanner/SerializationBug.

@jdom @sergeybykov Is there anything I can do to help resolve that issue? I'm not really familiar with the internals of Orleans type/serialization mangement, otherwise I would have investigated this issue by myself. This issue is blocking execution of some important test cases of my application.

jdom commented 8 years ago

I'm currently out, but I'Ill take a look at the updated repro on Monday. Please update the issue of you figure it out before then

jdom commented 8 years ago

@bwanner I think this has a combination of issues. Basically, because the library that has CustomObject is not referenced by the grain implementation project, then it's causing some issues at runtime. It is non-deterministic because the TestingSiloHost by default spins up 2 silos. When the grain activation lands in the same silo that the client is talking to (the Primary silo in this test infrastructure), then serialization of this unknown type works correctly. When it lands on the secondary silo, I assume the forwarding is having serialization issues while in-flight (speculation, @sergeybykov will probably correct me).

The easiest workaround is to reference the library project from the grain implementation project, and add the following to that project (such as in AssemblyInfo.cs):

[assembly: KnownType(typeof(CustomObject))]

Having said that, this is kind of unexpected behavior. Serialization should either work or not work, regardless of whether you are running with 1 or more silos and there's forwarding involved.

As validation, if you keep everything as is (no KnownType declaration), but add the following constructor to just start the cluster with only 1 silo, you'll see that tests pass deterministically:

public UnitTest1() : base(new TestingSiloOptions() { StartSecondary = false }) { }
bwanner commented 8 years ago

@jdom The activation being placed in the primary/secondary sounds reasonable.

I see CustomObject is not referenced by the grain implementation object, this is intended. The grain implementation project is supposed to define generic classes for arbitrary objects and acts as a library. So it can not know about all types it is applied to. The setting is similar to defining a List<T> where it would not make sense for List<T> to know about each possible T.

How can I achieve that in Orleans?

bwanner commented 8 years ago

Tried another approach and created a MasterGrainProject referencing both the GrainLibraryProject defining functionality over generic types as well as the project containing the types to be used with it. Then serializers are generated using [assembly: KnownType(typeof(CustomObject))] (for some reason KnownAssembly is not working for me). However, these serializers are only made available to the newly created MasterGrainProject via OrleansCodeGenerationTargetAttribute although it is required by the existing GrainLibraryProject, so this idea of a fix does not help at all.

For now, I see two possible approaches to fix this in the long term:

  1. Add another optional parameter to the KnownType attribute for defining the code generation target.
  2. Use runtime code generation, which seems to be the more elegant solution to me.

@ReubenBond: I saw you developed lots on code generation, can you provide some guidance? :)

bwanner commented 8 years ago

As it turns out the serialization exception is only thrown in the following scenario:

  1. Testing environment is created, consisting of primary and secondary silo
  2. Grain<CustomObject> is created which runs on secondary silo (run test until you're lucky)
  3. Grain<CustomObject>.Get() is executed on secondary silo, which returns a new instance of GenericWrapper<CustomObject>
  4. For some reason the response is received by the primary silo, which tries to deserialize CustomObject and fails because no serializer for this type is registered.

Once a Grain<CustomObject> was also instantiated on the primary silo, requests to Grain<CustomObject>.Get() work for grains hosted on both silos.

@sergeybykov Can you explain how communication between the silos is supposed to work or why this response needs to be received by the primary silo?

A possible fix would be to load the assembly containing CustomObject on silo startup and run code generation for these types, however I am not sure if that's the way to go.

sergeybykov commented 8 years ago

@bwanner I downloaded your sources, added [assembly: KnownType(typeof(CustomObject))] to \SerializationBug-master\GrainLibraryProject\Properties\AssemblyInfo.cs, and I see that now a serializer for CustomObject got generated in SerializationBug-master\GrainLibraryProject\Properties\orleans.codegen.cs and all the tests pass.