failsafe-lib / failsafe

Fault tolerance and resilience patterns for the JVM
https://failsafe.dev
Apache License 2.0
4.16k stars 295 forks source link

Why recommend CircuitBreaker before Retry? #377

Open clarkbreyman opened 6 months ago

clarkbreyman commented 6 months ago

https://failsafe.dev/policies/#composition-recommendations Recommends circuit breaker inside retry rather than outside retry. I'd love to see more elaboration as to why?

I would think the reverse would be better, however Failsafe is excellent so I assume you have good reasons. My thinking:

Transient errors (short disruptions, service mesh sent request to bad backend, ...) can be mitigated by retry and should not determine if the request should be made or fallback applied. The circuit breaker should trip only when the transient issue is unrecoverable.

Retries outside the circuit breaker would result in higher error counts as retries (more likely to fail during brief outages) would accrue to the threshold. It's easier to think about CB when the counts are upstream (consumer side) requests.

jhalterman commented 6 months ago

Hi @clarkbreyman - that recommendation isn't meant to be firm. Indeed, there are good reasons to use retry policy and circuit breaker in either order, and your comments make sense to me. The docs could maybe elaborate on these some more. A general guideline is how long your retry delay is. For short delays between retries, it can make sense to put the retry policy inside the breaker, so you don't hammer a closed breaker. For longer delays between retries, you might put the retry policy outside the breaker.