dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.07k stars 2.03k forks source link

NullRefrenceException on Orleans.Runtime.Messaging.PrefixingBufferWriter.Sequence.SequenceSegment.ResetMemory() #7337

Closed mradovcic closed 2 years ago

mradovcic commented 2 years ago

Hi all, I've encountered this issue today

System.NullReferenceException: Object reference not set to an instance of an object.
   at Orleans.Runtime.Messaging.PrefixingBufferWriter`2.Sequence.SequenceSegment.ResetMemory()
   at Orleans.Runtime.Messaging.PrefixingBufferWriter`2.Sequence.RecycleAndGetNext(SequenceSegment segment)
   at Orleans.Runtime.Messaging.PrefixingBufferWriter`2.Sequence.Reset()
   at Orleans.Runtime.Messaging.PrefixingBufferWriter`2.Reset(TBufferWriter writer)
   at Orleans.Runtime.Messaging.MessageSerializer.Write[TBufferWriter](TBufferWriter& writer, Message message)
   at Orleans.Runtime.Messaging.Connection.ProcessOutgoing()
   at Orleans.Internal.OrleansTaskExtentions.<ToTypedTask>g__ConvertAsync|4_0[T](Task`1 asyncTask)
   at Orleans.Runtime.GrainDirectory.LocalGrainDirectory.LookupAsync(GrainId grainId, Int32 hopCount)
   at Orleans.Runtime.GrainDirectory.DhtGrainLocator.Lookup(GrainId grainId)
   at Orleans.Runtime.Scheduler.AsyncClosureWorkItem`1.Execute()
   at Orleans.Runtime.Placement.RandomPlacementDirector.OnSelectActivation(PlacementStrategy strategy, GrainId target, IPlacementRuntime context)
   at Orleans.Runtime.Placement.PlacementDirectorsManager.SelectOrAddActivation(ActivationAddress sendingAddress, PlacementTarget targetGrain, IPlacementRuntime context, PlacementStrategy strategy)
   at Orleans.Runtime.Dispatcher.AddressMessageAsync(Message message, PlacementTarget target, PlacementStrategy strategy, ActivationAddress targetAddress)
   at Orleans.Runtime.Dispatcher.<>c__DisplayClass39_0.<<AsyncSendMessage>g__TransportMessageAferSending|0>d.MoveNext()

It seems that this is connected to SMS stream provider that is used within the application to dispatch state changes from producer grain to many consumer grains.

Is there a known issue that causes this?

It's worth pointing out that tone of traffic goes through the stream but all consumer grains are located on the same silo as producer grain. One consumer grain can also consume multiple different streams but there is only one producer per stream. Orleans version 3.4.3 is used on net5.0 framework.

ReubenBond commented 2 years ago

Thanks for reporting, @mradovcic. We'll have a fix in 3.5.1

mradovcic commented 2 years ago

@ReubenBond that sounds great thx.

Do You possibly have a short explanation for how this occurs. We've been running this for about a year with no major changes in the way streams are handled and this happend jusr this one time. Is it possible that it's caused by some other issue within the application or it only occurs on specific version(s). I was not able to reproduce it, so it would be really interesting to know the conditions for this to happen.

ReubenBond commented 2 years ago

Basically, there is an issue with how network connections are terminated. When the system tells a connection to terminate gracefully, the current implementation can return internal buffers to the memory pool while they are still in use. Doing so is what causes that NullReferenceException which you're seeing. Apologies for the hassle. I'll have a fix shortly and we can hopefully release 3.5.1 this week.

ReubenBond commented 2 years ago

Fixed in #7345