akkadotnet / akka.net

Canonical actor model implementation for .NET with local + distributed actors in C# and F#.
http://getakka.net
Other
4.66k stars 1.04k forks source link

"Shutting down myself" caused by error occured in remote node. #7113

Open ingted opened 4 months ago

ingted commented 4 months ago

Version Information Version of Akka.NET? 1.5.0 Which Akka.NET Modules? Akka Remote

Describe the bug A clear and concise description of what the bug is.

  1. Have two actors created in node A (port 64609) & B (port 64640)
  2. actor_in_a tell actor_in_b and actor_in_b would process the message and tell back
  3. However the generated reponse message is unable to be serializeb by Hyperion and caused error "Failed to write message to the transport" in node B
    AssociationError [akka.tcp://cluster-system@10.28.199.143:64609] <- akka.tcp://cluster-system@10.28.199.143:64640: Error [Failed to write message to the transport] []
  4. Then A bumped into disassociation issue with a MYTHICAL node 64643 (I didn't create it)
    Association between local [tcp://cluster-system@10.28.199.143:64643] and remote [tcp://cluster-system@10.28.199.143:64609] was disassociated because the ProtocolStateActor failed: Unknown
  5. Then B diassociates
    Association with remote system akka.tcp://cluster-system@10.28.199.143:64640 has failed; address is now gated for 5000 ms. Reason is: [Akka.Remote.EndpointException: Failed to write message to the transport   ---> Hyperion.ValueSerializers.UnsupportedTypeException: No coercion operator is defined between types 'CefBrowser*' and 'System.Object'.     at Hyperion.ValueSerializers.UnsupportedTypeSerializer.WriteManifest(Stream stream, SerializerSession session)     at lambda_method305(Closure, Stream, Object, SerializerSession)     at Hyperion.ValueSerializers.ObjectSerializer.WriteValue(Stream stream, Object value, SerializerSession session)     at Hyperion.Extensions.StreamEx.WriteObject(Stream stream, Object value, Type valueType, ValueSerializer valueSerializer, Boolean preserveObjectReferences, SerializerSession session)     at lambda_method299(Closure, Stream, Object, SerializerSession)     at Hyperion.ValueSerializers.ObjectSerializer.WriteValue(Stream stream, Object value, SerializerSession session)     at Hyperion.Extensions.StreamEx.WriteObject(Stream stream, Object value, Type valueType, ValueSerializer valueSerializer, Boolean preserveObjectReferences, SerializerSession session)     at Hyperion.SerializerFactories.EnumerableSerializerFactory.<>c__DisplayClass10_0.<BuildSerializer>b__1(Stream stream, Object o, SerializerSession session)     at Hyperion.ValueSerializers.ObjectSerializer.WriteValue(Stream stream, Object value, SerializerSession session)     at lambda_method76(Closure, Stream, Object, SerializerSession)     at Hyperion.ValueSerializers.ObjectSerializer.WriteValue(Stream stream, Object value, SerializerSession session)     at lambda_method72(Closure, Stream, Object, SerializerSession)     at Hyperion.ValueSerializers.ObjectSerializer.WriteValue(Stream stream, Object value, SerializerSession session)     at lambda_method74(Closure, Stream, Object, SerializerSession)     at Hyperion.ValueSerializers.ObjectSerializer.WriteValue(Stream stream, Object value, SerializerSession session)     at Hyperion.Serializer.Serialize(Object obj, Stream stream, SerializerSession session)     at Hyperion.Serializer.Serialize(Object obj, Stream stream)     at Akka.Serialization.HyperionSerializer.ToBinary(Object obj)     at Akka.Remote.MessageSerializer.Serialize(ExtendedActorSystem system, Address address, Object message)     at Akka.Remote.EndpointWriter.WriteSend(Send send)     --- End of inner exception stack trace ---     at Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level, Boolean needToThrow)     at Akka.Remote.EndpointWriter.WriteSend(Send send)     at Akka.Remote.EndpointWriter.<Writing>b__27_0(Send s)     at lambda_method64(Closure, Object, Action`1, Action`1, Action`1)     at Akka.Actor.ReceiveActor.OnReceive(Object message)     at Akka.Actor.UntypedActor.Receive(Object message)     at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)     at Akka.Actor.ActorCell.Invoke(Envelope envelope)]
  6. Then node A & B diassociate
    Disassociated [akka.tcp://cluster-system@10.28.199.143:64640] -> akka.tcp://cluster-system@10.28.199.143:64609
    Disassociated [akka.tcp://cluster-system@10.28.199.143:64609] <- akka.tcp://cluster-system@10.28.199.143:64640
  7. At last, A & B shut down: (seed node has port 9000)

For node B <= Shutting down myself

Message [AckIdleCheckTimer] from [akka://cluster-system/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fcluster-system%4010.28.199.143%3A64640-2/endpointWriter#1537073490] to [akka://cluster-system/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fcluster-system%4010.28.199.143%3A64640-2/endpointWriter#1537073490] was not delivered. [1] dead letters encountered. If this is not an expected behavior then [akka://cluster-system/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fcluster-system%4010.28.199.143%3A64640-2/endpointWriter#1537073490] may have terminated unexpectedly. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. Message content: Akka.Remote.EndpointWriter+AckIdleCheckTimer

Cluster Node [akka.tcp://cluster-system@10.28.199.143:64609] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://cluster-system@10.28.199.143:64640, Uid=1558551789 status = Up, role=[ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143], upNumber=3, version=12.8.202)]. Node roles [ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143]

"Couldn't establish a causal relationship between "remote" gossip and "local" gossip - Remote[Gossip(members = [Member(address = akka.tcp://cluster-system@10.28.199.143:9000, Uid=1028805500 status = Up, role=[dd,singletonRole,SeedNode,petabridge.cmd], upNumber=1, version=7.1.460), Member(address = akka.tcp://cluster-system@10.28.199.143:64609, Uid=942161684 status = Up, role=[ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143], upNumber=2, version=1.0.0), Member(address = akka.tcp://cluster-system@10.28.199.143:64640, Uid=1558551789 status = Up, role=[ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143], upNumber=3, version=12.8.202)], overview = GossipOverview(seen=[UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:9000, 1028805500), UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:64640, 1558551789)], reachability=Reachability([akka.tcp://cluster-system@10.28.199.143:64640 -> UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:64609, 942161684): Unreachable [Unreachable] (1)])), version = VectorClock(0DA4CAFA080D3226573233D2547D1AC0->6, 3EBA3B1B1C91D00A7301186C5FF6E40C->1)] - Local[Gossip(members = [Member(address = akka.tcp://cluster-system@10.28.199.143:9000, Uid=1028805500 status = Up, role=[dd,singletonRole,SeedNode,petabridge.cmd], upNumber=1, version=7.1.460), Member(address = akka.tcp://cluster-system@10.28.199.143:64609, Uid=942161684 status = Up, role=[ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143], upNumber=2, version=1.0.0), Member(address = akka.tcp://cluster-system@10.28.199.143:64640, Uid=1558551789 status = Up, role=[ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143], upNumber=3, version=12.8.202)], overview = GossipOverview(seen=[UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:64609, 942161684)], reachability=Reachability([akka.tcp://cluster-system@10.28.199.143:64609 -> UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:64640, 1558551789): Unreachable [Unreachable] (1)])), version = VectorClock(06163C12B3D0EBEA1063AC304EC6A2FE->1, 0DA4CAFA080D3226573233D2547D1AC0->6)] - merged them into [Gossip(members = [Member(address = akka.tcp://cluster-system@10.28.199.143:9000, Uid=1028805500 status = Up, role=[dd,singletonRole,SeedNode,petabridge.cmd], upNumber=1, version=7.1.460), Member(address = akka.tcp://cluster-system@10.28.199.143:64609, Uid=942161684 status = Up, role=[ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143], upNumber=2, version=1.0.0), Member(address = akka.tcp://cluster-system@10.28.199.143:64640, Uid=1558551789 status = Up, role=[ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143], upNumber=3, version=12.8.202)], overview = GossipOverview(seen=[], reachability=Reachability([akka.tcp://cluster-system@10.28.199.143:64609 -> UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:64640, 1558551789): Unreachable [Unreachable] (1)][akka.tcp://cluster-system@10.28.199.143:64640 -> UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:64609, 942161684): Unreachable [Unreachable] (1)])), version = VectorClock(06163C12B3D0EBEA1063AC304EC6A2FE->1, 0DA4CAFA080D3226573233D2547D1AC0->6, 3EBA3B1B1C91D00A7301186C5FF6E40C->1)]"

Received gossip where this member has been downed, from [akka.tcp://cluster-system@10.28.199.143:9000]

Message [BackoffTimer] from [akka://cluster-system/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fcluster-system%4010.28.199.143%3A64640-2/endpointWriter#816387495] to [akka://cluster-system/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fcluster-system%4010.28.199.143%3A64640-2/endpointWriter#816387495] was not delivered. [8] dead letters encountered. If this is not an expected behavior then [akka://cluster-system/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fcluster-system%4010.28.199.143%3A64640-2/endpointWriter#816387495] may have terminated unexpectedly. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. Message content: Akka.Remote.EndpointWriter+BackoffTimer

Cluster Node [akka.tcp://cluster-system@10.28.199.143:64609] - Node has been marked as DOWN. Shutting down myself

For Node A <= Shutting down myself

Cluster Node [akka.tcp://cluster-system@10.28.199.143:64640] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://cluster-system@10.28.199.143:64609, Uid=942161684 status = Up, role=[ShardNode,petabridge.cmd,ShardAnalyticServiceNode,10.28.199.143], upNumber=2, version=1.0.0)]. Node roles [ShardNode,petabridge.cmd,ShardAnalyticServiceNode,10.28.199.143]

Cluster Node [akka.tcp://cluster-system@10.28.199.143:64640] - Receiving gossip from [UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:9000, 1028805500)]

Received gossip where this member has been downed, from [akka.tcp://cluster-system@10.28.199.143:9000]

Cluster Node [akka.tcp://cluster-system@10.28.199.143:64640] - Node has been marked as DOWN. Shutting down myself

To Reproduce If needed, I will provide it in a small project.

Expected behavior Errors occured in node B should not shut down node A...

Actual behavior Node A "Shutting down myself"....

Environment I am running on Windows with .NET 7.

ingted commented 4 months ago

This time it is different from https://github.com/akkadotnet/akka.net/issues/2903. Now the disassociation cause each other shut down themself...

image

image

ingted commented 4 months ago

Since the error is expected and we can certainly not to trigger it... anyway... T_T|||