connor4312 / cockatiel

🐦 A resilience and fault-handling library. Supports Backoffs, Retries, Circuit Breakers, Timeouts, Bulkhead Isolation, and Fallbacks.
MIT License
1.57k stars 48 forks source link

TimeoutPolicy.onTimeout not called with TimeoutPolicy.onFailure #56

Closed xenoterracide closed 2 years ago

xenoterracide commented 2 years ago

this is the relevant part of our logs, you can see onFailure is called but there's no call to onTimeout

[1647969594170] TRACE (BreakerProvider/6059 on 5c52309d33e3): checking retry {"error":{"statusCode":500,"response":{},"message":"500 - undefined","options":{"method":"put","headers":{"authorization":"Bearer Bearer","api-key":"a"},"url":"bbar","timeout":5000,"agentOptions":null}},"delay":172.15364909827179}
[1647969594342] TRACE (ExchangeDelegate/6059 on 5c52309d33e3): cockatiel {"signal":{},"attempt":2}
[1647969594343] DEBUG (BreakerProvider/6059 on 5c52309d33e3): breaker HalfOpen
[1647969594344] WARN  (BreakerProvider/6059 on 5c52309d33e3): timed out {"duration":1.291599988937378,"handled":false,"reason":{"error":{"statusCode":500,"response":{},"message":"500 - undefined","options":{"method":"put","headers":{"authorization":"Bearer Bearer","api-key":"a"},"url":"bbar","timeout":5000,"agentOptions":null}}}}
[1647969594344] ERROR (BreakerProvider/6059 on 5c52309d33e3): too many failures, breaker tripped {"error":{"statusCode":500,"response":{},"message":"500 - undefined","options":{"method":"put","headers":{"authorization":"Bearer Bearer","api-key":"a"},"url":"bbar","timeout":5000,"agentOptions":null}}}
[1647969594344] DEBUG (BreakerProvider/6059 on 5c52309d33e3): breaker Open
[1647969594344] WARN  (BreakerProvider/6059 on 5c52309d33e3): failure {"duration":1.8672599792480469,"handled":true,"reason":{"error":{"statusCode":500,"response":{},"message":"500 - undefined","options":{"method":"put","headers":{"authorization":"Bearer Bearer","api-key":"a"},"url":"bbar","timeout":5000,"agentOptions":null}}}}
[1647969594344] ERROR (BreakerProvider/6059 on 5c52309d33e3): giving up {"error":{"statusCode":500,"response":{},"message":"500 - undefined","options":{"method":"put","headers":{"authorization":"Bearer Bearer","api-key":"a"},"url":"bbar","timeout":5000,"agentOptions":null}}}

here's the BreakerProvider

/** @format */

import { LogProvider } from 'logger'
import {
  CircuitBreakerPolicy,
  CircuitState,
  ConsecutiveBreaker,
  IPolicy,
  Policy,
  TimeoutPolicy,
  TimeoutStrategy,
} from 'cockatiel'
import { inject } from 'inversify'

import { BreakerProperties } from './breaker-properties'

export class BreakerProvider {
  private readonly log = LogProvider.getLogger()
  private readonly breakerPolicy: CircuitBreakerPolicy
  private readonly timeoutPolicy: TimeoutPolicy

  constructor(@inject(BreakerProperties) readonly props: BreakerProperties) {
    const breaker = new ConsecutiveBreaker(props.breakerConsecutiveFailures)

    this.timeoutPolicy = Policy.timeout(props.timeoutDuration.toMillis(), TimeoutStrategy.Cooperative)
    this.breakerPolicy = Policy.handleAll().circuitBreaker(props.breakerHalfOpenDuration.toMillis(), breaker)

    this.breakerPolicy.onStateChange(data => {
      this.log.debug('breaker', CircuitState[data])
      this.props.onStateChange?.call(this.breakerPolicy, data)
    })
    this.timeoutPolicy.onTimeout(data => {
      this.log.trace('checking timeout', data)
      this.props.onTimeout?.call(this.timeoutPolicy, data)
    })

    this.breakerPolicy.onBreak(data => this.log.error('too many failures, breaker tripped', data))
    this.timeoutPolicy.onFailure(data => this.log.warn('timed out', data))
  }

  provideRetry(): IPolicy {
    const retryPolicy = Policy.handleAll().retry().exponential().attempts(this.props.retryAttempts)
    retryPolicy.onGiveUp(data => {
      this.log.error('giving up', data)
      this.props.onGiveup?.call(retryPolicy, data)
    })
    retryPolicy.onRetry(data => {
      this.log.trace('checking retry', data)
      this.props.onRetry?.call(retryPolicy, data)
    })
    return retryPolicy
  }

  providePolicy(retry: boolean): IPolicy {
    const policy = retry
      ? Policy.wrap(this.provideRetry(), this.breakerPolicy, this.timeoutPolicy)
      : Policy.wrap(this.breakerPolicy, this.timeoutPolicy)

    policy.onFailure(data => this.log.warn('failure', data))
    policy.onSuccess(data => this.log.debug('success', data))
    return policy
  }
}
connor4312 commented 2 years ago

This is by design -- the readme mentions onFailure is not called on timeout:

timeout.onFailure(callback)

An event emitter that fires whenever a function throw an error or returns an errorful result. It's invoked with the duration of the call, the reason for the failure, and an boolean indicating whether the error is handled by the policy.

This is only called when the function itself fails, and not when a timeout happens.

Likewise onTimeout is called only for timeouts, not for failures.

The logic being it's easier to 'combine' the two events if you need a count of both together, but it's hard to separate them.

xenoterracide commented 2 years ago

maybe what's not clear here is what, outside of a timeout, is considered an error/failure for timeout? To me it would seem like the timeout is the failure there.

connor4312 commented 2 years ago

Hm, I agree that it might be a little less apparent for timeout. onFailure is implement by all policies, and might be easier to explain in the context of retries: whenever the inner function is invoked and it throws something (or returns an errorful result), then onFailure is called. The retry policy might keep retrying, until onGiveUp fires.

onSuccess and onFailure are provided from the "executor" that report how the function itself gets called and are separate from any events the policy itself might implement.

xenoterracide commented 2 years ago

Right, but what does ontimeout do that does not meet that criteria? I think that's where my confusion comes in. How is ontimeout not just the thing that is throwing the failure...