failsafe-lib / failsafe

Fault tolerance and resilience patterns for the JVM
https://failsafe.dev
Apache License 2.0
4.17k stars 296 forks source link

A CircuitBreaker policy may pass an incorrect execution duration #325

Closed timothybasanov closed 2 years ago

timothybasanov commented 2 years ago

When a circuit breaker is already open and runAsync is used, ExecutionContext.getElapsedTime() returns unexpectedly large numbers (e.g. days instead of seconds) within an event listener.

Seemingly the issue is caused by not starting an execution in rare edge cases. As a default start timestamp is 0 nanos, it means that elapsed time may be measured since a random point in time (e.g. since the last laptop restart) instead of an expected value.

It also may make sense to switch from Duration to long or Instant within ExecutionContext to more accurately reflect that it is a point in time. Additionally, using a default value of System.nanoTime() or Instant.now() may decrease a risk of an incorrect initialization.

Reproducing an issue

AtomicReference<Duration> failedDuration = new AtomicReference<>();
FailsafeExecutor<Object> failsafeExecutor = Failsafe.with(
                new CircuitBreaker<>())
        .onFailure(e -> failedDuration.set(e.getElapsedTime()));
try {
    failsafeExecutor.run(() -> {
        throw new IllegalStateException("testCircuitBreaker1");
    });
    throw new AssertionError();
} catch (IllegalStateException expected) {
    // It should throw on a first try
}
try {
    failsafeExecutor.runAsync(() -> {
        throw new IllegalStateException("testCircuitBreaker2");
    }).join();
    throw new AssertionError();
} catch (CompletionException expected) {
    // It should throw on a first try
}
if(failedDuration.get().compareTo(Duration.ofNanos(System.nanoTime() / 2)) > 0) {
    throw new IllegalStateException("" + failedDuration);
}

When run on my laptop it shows a duration of several days and not seconds:

Exception in thread "main" java.lang.IllegalStateException: PT413H51M12.005370612S

Workaround

Check if the execution has not started and return a predefined constant e.g.

private static long getEventElapsedMicros(ExecutionEvent executionEvent) {
    if (executionEvent.getStartTime() == Duration.ZERO) {
        return 1;
    }
    return TimeUnit.MICROSECONDS.convert(executionEvent.getElapsedTime());
  }
jhalterman commented 2 years ago

@timothybasanov Which version did you hit this with? I'm having trouble reproducing on 2.4.4 and 3.1. Does it reproduce for you on 3.1?

timothybasanov commented 2 years ago

Not a Contribution.

Verified that it is present in 2.4.1 and was fixed in 2.4.2 already. Thank you!