Timeout set by client is only respected by top-level grain

dotnet / orleans

Cloud Native application framework for .NET

https://docs.microsoft.com/dotnet/orleans

MIT License

10k stars 2.02k forks source link

Timeout set by client is only respected by top-level grain #4328

Open pmsanford opened 6 years ago

pmsanford commented 6 years ago

According to https://github.com/dotnet/orleans/issues/1693#issuecomment-212984821 the timeout set by the client should be respected and propagated down to grains invoked by the grain running on the silo. I have a (fairly) minimal project here that demonstrates it is not: https://github.com/pmsanford/orleans-timeout-propagation-demo

It appears that the request to the first grain completes even if it takes longer than the SiloMessagingOptions.ResponseTimeout. However, any grain calls beyond the initial call only respect the SiloMessagingOptions.ResponseTimeout. Have I misinterpreted the comment on the linked issue or is this a bug?

Should a timeout set on MessagingOptions (or ClientMessagingOptions (??)) on the client side propagate through all grain activations on the silo?

Edited to add: This is on OSX, with dotnet --version 2.1.4 and Orleans version 2.0.0-rc2

ReubenBond commented 6 years ago

The current behavior is certainly that timeouts are not propagated along a call chain. I think that's the least surprising behavior.

For it to be useful, we would have to progressively decrease the timeout for subsequent calls. Eg, if a client with a timeout of 5 minutes calls grain A and grain A calls grains B & C, the timeout for each of those sub calls determined by estimating how much time remains on the client's timeout. If we call B and C in parallel, we might set the timeouts to 5 minutes per call, but if they're called in series then the timeout to B is 5 minutes but the timeout for C is whatever time remains after B returns.

In that case, do we propagate the timeout to other systems like remote web services or databases? That implies that we make some sort of "time remaining" available in the context of a call (eg, on RequestContext)

It's doable, but I'm not convinced that propagating a running timeout is worth the added book keeping & complexity.

pmsanford commented 6 years ago

For what my opinion is worth, I tend to agree that it's not worth the added complexity, and I'm not sure there's a way that's necessarily less confusing, but I'd argue that setting a timeout of 2 minutes and having your requests fail with an error that says Response did not arrive on time in 00:00:30 is surprising (at least it was to me when I encountered it). I think it's probably a matter of documentation, though, as once I realized what was going on it all made sense.