Revisit PoolSelector state machine transitions when error occurs during reception of response entity

jrudolph commented 7 years ago

Follow up to #416 / #481.

With @gosubpl's change implemented in #481 we might end up in a situation where after response entity stream error the PoolConductor thinks a PoolSlot is in a different state.

@leachbj summarized it like this:

So the possible ordering here is; Restating your summary above @gosubpl - if we map the response failures to SlotEvent.RequestCompleted(slot) we get;

Fail before the response headers (onUpstreamFailure); SlotEvent.Disconnected(slot, 1); transitions from Loaded(1) => Unconnected Fail after the response headers (Future failed and then onUpstreamFailure); SlotEvent.RequestCompleted(slot); transitions from Loaded(1) => Idle SlotEvent.Disconnected(slot, 0); transitions from Idle => Unconnected Fail after the response headers (onUpstreamFailure and then Future failed); SlotEvent.Disconnected(slot, 0); transitions from Loaded(1) => Loaded(1) SlotEvent.RequestCompleted(slot); transitions from Loaded(1) => Idle This last would mean that a subsequent request would be routed to a slot that appears to be Idle but is actually Unconnected. Like @jrudolph says this slot would be preferred over an Unconnected one (but thats no real concern) but also over a slot that is actually connected and Idle which isn't ideal but wouldn't break.

I'm not sure if case 2 presents a chance for a request to sneak through to the slot before the onUpstreamFailure gets processed. If that happened though the 2nd event would be Disconnected(slot, 1) so I think thats fine.

In case 3 if pipelining was enabled and a request was processed before the 2nd event then we should see a Disconnected(slot, 1) before or after the pending RequestCompleted(slot) which should leave us in Unconnected regardless of delivery order.

We might want to improve this situation in the future as suggested in https://github.com/leachbj/akka-http/commit/516f997b94fb980a1a4b314ed3bd4c5b49a9597d.

On the other hand, there's always a race going on between a PoolSlot notifying the conductor of its current state vs. the conductor already handling a new request based on the old state. I.e. we know that there's not a perfect sync. As long as a slot cannot get stuck we might just accept temporary inaccuracies. We should also take into account how enabling pipelining could change the state machine sync.

leachbj commented 7 years ago

@jrudolph I'm still wondering if the current PoolSlot PoolConductor complication is worth it. Trying to manage that state across the two stages is prooving to be quite difficult to get correct and is only real purpose is to support pipelining which it does poorly anyhow (I'm fairly sure it works correctly for idempotent requests but if there are any non-idempotent requests in the flow its broken). I haven't really thought it through fully but my feeling was if PoolSlot could start managing its demand based on that state then the non-pipeline model would be much simpler. I'm not sure how to solve pipeling in that model though - perhaps its just a case that if PoolSlot pulls a non-idempotent request then it buffers it until its downstream is complete (including retries and reconnections). That wouldn't give ideal scheduling but then pipeling is known to give non-ideal scheduling since head of queue will block anyhow.

jrudolph commented 7 years ago

I guess any kind of simplification would be nice. We should gather requirements first to make sure we know what features we'd need.

Here's a small list from the top of my head:

schedule to idle connections instead of to unconnected one (so that big pools don't grow immediately to max size)
run non-idempotent requests only on idle or unconnected connections (maybe even with the option to run them only on unconnected ones to prevent the race-condition between idle-timeout and having just sent the non-idempotent request)
retries
pipelining

gosubpl commented 7 years ago

See also https://github.com/akka/akka-http/issues/299 with possible bug coming from timing of operations.

akka / akka-http

Revisit PoolSelector state machine transitions when error occurs during reception of response entity #490