Closed pcwiek closed 9 years ago
That could be some odd json.net datetime issue you are seeing. That would explain why it works with default values as those are not emitted in json by default
Skickat från min iPhone
20 apr 2015 kl. 17:37 skrev Patryk Ćwiek notifications@github.com:
Is there a way to log everything that's happening in an actor and, if possible, the actor system in general?
I'm using TopShelf for service hosting and Serilog (min level: debug) for logging.
Including:
akka { (...) actor { loglevel = DEBUG stdout-loglevel = DEBUG provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster" debug { receive = on autoreceive = on fsm = on lifecycle = on unhandled = on event-stream = on router-misconfiguration = on } loggers = ["Akka.Logger.Serilog.SerilogLogger,Akka.Logger.Serilog"] remote { log-remote-lifecycle-events = DEBUG (...) } } in the configuration doesn't change anything when you compare the log output without the debug section specified at all.
I'm asking because I'm trying to track a really bizarre behavior (one message, which has 4 collections in it, all of collection items have Date (DateTime) member, is going nowhere unless I set members' dates to default(DateTime) in at least one of the collections...?!) but it might be hard to reproduce. So I'd rather exhaust all my monitoring and debugging options first before spending half my day setting up a separate solution...
— Reply to this email directly or view it on GitHub.
Could be, although if I 'manually' convert to JSON, everything seems to work fine.
Also, dates in each and every collection are converted just fine separately, but not when I'm trying to send all of them, hence my enormous WTF.
Can you try this?:
var serializer = actorSystem.Serialization.FindSerializerFor(myMessage);
var binary = serializer.ToBinary(myMessage);
var result= serializer.FromBinary(binary, typeof(MyMessageType));
And see what comes out of result
if its a good copy of your message, we can rule out serialization.
If it's not a serializer issue, my best bet would be that the dates doesnt have anything to do with it except for message size and it could be a remoting issue with large messages (?)
But lets verify the serializer first
Sure, I'll try that tomorrow in the morning.
My first thought was some kind of message size limit too - especially considering that all parts of the message seem to work on their own, but when I put them together the message goes into void - but I don't know nearly enough about internals of Akka to tell whether my hunch was even close to being right.
@rogeralsing I have just tried it, serialization works fine - round trip serialization gives the exact copy.
[EDIT] Although it's a bit of XY problem, my original question still stands - how to enable full debugging and tracing in a system? :) It would make it that much easier to troubleshoot...
[EDIT2] OK, I found a way to log all messages, unfortunately the heartbeat and gossip messages clutter the log a little bit...
akka {
loglevel = DEBUG
remote {
log-sent-messages = on
log-received-messages = on
}
}
@rogeralsing @Aaronontheweb I encountered the same thing in a different place in the application. Is there some kind of a size limit for remote messages? It seems that big messages get lost somewhere on their way... Either that or there's something weird going on with Ask
.
For example, this time I have a details screen with entries in it. From the business process perspective, the number of entries can vary between ~500 and 20000. One entry is a string (~10 - 12 characters) and a boolean flag. The details screen has an extra description (string
, up to 200 characters), one date (DateTimeOffset
), user name (string
, in practice up to ~20 characters), boolean flag, two ids (int
).
When I take <= ~950 elements, I get the reply back to the Web application that Ask
s for the data. When I take >= ~1000 elements, originalSender.Tell
in the remote actor sends the reply, but Ask
in the web app is just hanging there.
I want to add that the first case when I had that happen (~4 - 5 collections, 240 elements each) is also when I Ask
the local proxy actor from the Web app. It can be a coincidence, but I think it's worth mentioning. That 'proxy' actor forwards the message to all remote actors in the cluster with original sender which should get the reply, which is, for example, tcp.akka//my-system@127.0.0.1:2300/temp/f
or similar.
Serialization seems to work just fine...
Is it possible that that Ask
right at the very beginning is messing something up? Could that be somehow related to the other issue with Tell
going missing?
@rogeralsing @Aaronontheweb Another update! Sorry guys for flooding you with all that...
Anyway, when the message is 'small' (as in, it passes through), the chatter between the actors looks like this:
Web (Asking local proxy actor)
2015-04-22 12:08:45.914 -04:00 [Debug] "RemoteMessage: ActorSelectionMessage - Message: Shared.General.DetailsRequest - WildCartFanOut: False - Elements: user/details to [[akka.tcp://my-system@127.0.0.1:21300/]]<+[akka.tcp://my-system@127.0.0.1:21300/] from [[akka://my-system/temp/g]]"
Service (processing and Tell
to sender, which would be temporary Ask
actor)
Receive is acked
2015-04-22 12:08:45.922 -04:00 [Debug] received local message ["RemoteMessage: ActorSelectionMessage - Message: Shared.General.DetailsRequest - WildCartFanOut: False - Elements: user/details to [akka://my-system/]<+akka://my-system/ from [akka.tcp://my-system@127.0.0.1:23000/temp/g]"]
Reply is sent
2015-04-22 12:08:46.019 -04:00 [Debug] "RemoteMessage: Shared.General.Data.Details to [[akka.tcp://my-system@127.0.0.1:23000/temp/g]]<+[akka.tcp://my-system@127.0.0.1:23000/temp/g] from [[akka://my-system/user/details/get/$d]]"
Web (ack when Ask
returns)
2015-04-22 12:08:46.071 -04:00 [Debug] received local message ["RemoteMessage: Shared.General.Data.Details to [akka://my-system/temp/g]<+akka://my-system/temp/g from [akka.tcp://my-system@127.0.0.1:21300/user/details/get/$d]"]
Everything just works.
On the other hand...
Web (Asking local proxy actor) [Pretty much the same as above]
2015-04-22 12:08:06.641 -04:00 [Debug] "RemoteMessage: ActorSelectionMessage - Message: Shared.General.DetailsRequest - WildCartFanOut: False - Elements: user/details to [[akka.tcp://my-system@127.0.0.1:21300/]]<+[akka.tcp://my-system@127.0.0.1:21300/] from [[akka://my-system/temp/e]]"
Service (processing and Tell
to sender) [Pretty much the same as above]
Receive is acked
2015-04-22 12:08:06.867 -04:00 [Debug] received local message ["RemoteMessage: ActorSelectionMessage - Message: Shared.General.DetailsRequest - WildCartFanOut: False - Elements: user/details to [akka://my-system/]<+akka://my-system/ from [akka.tcp://my-system@127.0.0.1:23000/temp/e]"]
Reply is sent
2015-04-22 12:08:08.790 -04:00 [Debug] "RemoteMessage: Shared.General.Data.Details to [[akka.tcp://my-system@127.0.0.1:23000/temp/e]]<+[akka.tcp://my-system@127.0.0.1:23000/temp/e] from [[akka://my-system/user/details/get/$b]]"
So, far so good, right?
Web (ack when Ask
returns)
Missing completely...
No idea what happens here. All preceding steps and operations are exactly the same.
@pcwiek so the big message makes it over the wire, but the Ack never returns?
@Aaronontheweb Hard to say if it actually makes it over the wire, because there's no way of telling what happens in the transport layer. What I can say is that the processing actor actually sends the message, but it's never received on the Ask
side and Ask
is just left there hanging...
@pcwiek so one thing worth bearing in mind is that the underlying Helios transport does have a maximum size threshold for messages it can send:
MaxFrameSize must be at least 32k. By default it's 128k https://github.com/akkadotnet/akka.net/blob/dev/src/core/Akka.Remote/Configuration/Remote.conf#L359
You can increase the size of this value to support whatever you need, but be aware that some network stacks (UDP) may not support arbitrarily large messages. We have some tests inside Helios that ramp up to 4mb and pass, so you have a lot of room to work with when using TCP: https://github.com/helios-io/helios/blob/dev/tests/Helios.MultiNodeTests/LargeMessageTests.cs#L105
@Aaronontheweb Thank you!
I upped the maximum-frame-size to 32 MB and now even the large analysis results are making it back. Perfect!
I just wish there was some kind of an error when that kind of situation happens, so that it could be logged - something simple like 'Remote message exceeds maximum frame size' or something :) Right now it's a completely silent failure.
@pcwiek so there is supposed to be a setting for this: https://github.com/akkadotnet/akka.net/blob/dev/src/core/Akka.Remote/Configuration/Remote.conf#L142
But we don't respect or use it inside Akka.Remote right now - so I'm going to rename this issue and leave it open as a to-do.
This might need to wait until Helios 2.0 ships, where we'll have some better error capture. Alternatively we can do this inside the HeliosTcpAssociationHandle
.
@Aaronontheweb For now I would just add a paragraph about message sizes and potential pitfalls to 'How transparency is broken' section in 'Remoting' docs. I can actually do that and send a pull request if you'd like.
@pcwiek that would be excellent - please send along a PR
Note - this needs to be implemented here: https://github.com/akkadotnet/akka.net/blob/dev/src/core/Akka.Remote/Endpoint.cs#L1237
And here's part of the Scala implementation for this:
https://github.com/akka/akka/blob/master/akka-remote/src/main/scala/akka/remote/Endpoint.scala#L745
https://github.com/akka/akka/blob/master/akka-remote/src/main/scala/akka/remote/Endpoint.scala#L670
This looks fun :) I'll have a look at it.
Is there a way to log everything that's happening in an actor and, if possible, the actor system in general?
I'm using TopShelf for service hosting and Serilog (min level: debug) for logging.
Including:
in the configuration doesn't change anything when you compare the log output without the
debug
section specified at all.I'm asking because I'm trying to track a really bizarre behavior (one message, which has 4 collections in it, all of collection items have
Date
(DateTime) member, is going nowhere unless I set members' dates todefault(DateTime)
in at least one of the collections...?!) but it might be hard to reproduce. So I'd rather exhaust all my monitoring and debugging options first before spending half my day setting up a separate solution...