akkadotnet / akka.net

Canonical actor model implementation for .NET with local + distributed actors in C# and F#.
http://getakka.net
Other
4.67k stars 1.04k forks source link

Exceeding max-concurrent-recoveries triggers circuit breaker #6106

Open lucavice opened 1 year ago

lucavice commented 1 year ago

Version Information Version of Akka.NET? 1.4.40 Which Akka.NET Modules? Akka.Cluster.Sharding 1.4.40, Akka.Persistence.SqlServer 1.4.35

Describe the bug In certain situations, exceedeing temporarily the max-concurrenct-recoveries parameter triggers a circuit breaker that prevents Akka Persistence to persist any further events for the duration of the circuit breaker.

See sequence of logged events here: image

I have been unable to reproduce reliably this problem, as it seems to happen fairly randomly on our production instance (a few times per day). Setting locally a max-concurrenct-recoveries equal to 1 and force recover of multiple actor at once does not seem to create the issue, so it must be triggered by a combination of factors.

We can't find the root of the error that triggers the circuit breaker. There is no information in the logged OpenCircuitException, and that's the only error that appears in the log (hundreds of times for the duration of the open circuit breaker).

To Reproduce I don't have reliable steps to trigger the problem. I would appreciate hints on what I could try to understand better the underlying problem and come up with a strategy to reproduce reliably. It may be possible that this is entirely caused by some bad programming on my side, but I'm a bit lost in what to look for.

Environment Windows on .NET 6

Aaronontheweb commented 1 year ago

Looks like @ismaelhamed has a fix for this targeting the v1.5 branch - we'll see if we can also backport that to a future v1.4 release as well. Going to review Ismael's work along with @Arkatufus right now.

lucavice commented 1 year ago

@ismaelhamed @Aaronontheweb I was able to retrieve some errors today which include a more detailed stack trace.

I'm posting it here, in case it helps identifying if it's actually related to https://github.com/akkadotnet/akka.net/pull/6109

Akka.Pattern.OpenCircuitException: Circuit Breaker is open; calls are failing fast
 ---> System.ArgumentException: The tasks argument contains no tasks. (Parameter 'tasks')
   at System.Threading.Tasks.TaskFactory.CheckMultiContinuationTasksAndCopy(Task[] tasks)
   at System.Threading.Tasks.TaskFactory`1.ContinueWhenAllImpl(Task[] tasks, Func`2 continuationFunction, Action`1 continuationAction, TaskContinuationOptions continuationOptions, CancellationToken cancellationToken, TaskScheduler scheduler)
   at Akka.Persistence.Sql.Common.Journal.SqlJournal.WriteMessagesAsync(IEnumerable`1 messages)
   at Akka.Util.Internal.AtomicState.CallThrough[T](Func`1 task)
   at Akka.Util.Internal.AtomicState.CallThrough[T](Func`1 task)
   --- End of inner exception stack trace ---