failsafe-lib / failsafe

Fault tolerance and resilience patterns for the JVM
https://failsafe.dev
Apache License 2.0
4.2k stars 297 forks source link

Feature: micrometer.io metrics integration #352

Open magicprinc opened 2 years ago

magicprinc commented 2 years ago

If you are looking for new ideas: https://micrometer.io/ Metrics would be great!

It is new SLF4J for metrics and all people I know use it as standard de facto.

If you need something for an inspiration: https://github.com/brettwooldridge/HikariCP/tree/dev/src/main/java/com/zaxxer/hikari/metrics/micrometer

https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/cache/CaffeineCacheMetrics.java

https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/okhttp3/OkHttpMetricsEventListener.java

jhalterman commented 2 years ago

The approach Failsafe takes is to expose event listeners which could be used to record metrics, which avoids adding any external dependencies. Are there any particular metrics/event listeners that you'd like to see?

magicprinc commented 2 years ago

Metrics are my colleges' responsibility, so I don't have a concrete answer yet :-)

But usually people like if somebody smarter than they—"an expert in the field" have already made decisions.

I know that Hystrix has a lot of metrics, even with HttpServlet to publish them as JSON feed.

Here are your competitors as a source of inspiration: https://resilience4j.readme.io/docs/micrometer

https://github.com/findinpath/spring-retry-metrics

magicprinc commented 2 years ago

Core library without dependencies is a good thing! It could be an additional (optional) artifact.

And people who use let's say Prometheus directly could clone the project and make a "Prometheus metrics for FailSafe"

cykl commented 2 months ago

@jhalterman Are you aware of anyone having implemented a good instrumentation for failsafe? (using Prometheus, OpenTelemetry, Micrometer or anything else).

I'm trying to instrument some code using failsafe with Micrometer. I expect to end up with something close to what is available in resilience4j. Contrary to what I expected, it doesn't seem to be that simple.

The idiomatic way to instrument something using Micrometer is to provide a MeterBinder.

Let's start with something simple and say we want to instrument a circuit breaker. It would look like this:

public class FailsafeCircuitBreakerMetrics implements MeterBinder {

    private final CircuitBreaker<?> circuitBreaker;
    private final String name;
    private final Iterable<Tag> tags;

    public FailsafeCircuitBreakerMetrics( CircuitBreaker<?> circuitBreaker, String name ) {
        this( circuitBreaker, name, Tags.empty( ) );
    }

    public FailsafeCircuitBreakerMetrics( CircuitBreaker<?> circuitBreaker, String name, Iterable<Tag> tags ) {
        this.circuitBreaker = circuitBreaker;
        this.name = name;
        this.tags = Tags.of( tags ).and( "name", name );
    }

    @Override
    public void bindTo( MeterRegistry registry ) {
        FunctionCounter.builder( "failsafe.circuit-breaker.execution.total", circuitBreaker,
                        CircuitBreaker::getSuccessCount )
                .baseUnit( BaseUnits.OPERATIONS )
                .tags( tags )
                .tag( "outcome", "success" )
                .register( registry );

        FunctionCounter.builder( "failsafe.circuit-breaker.execution.total", circuitBreaker,
                        CircuitBreaker::getSuccessCount )
                .baseUnit( BaseUnits.OPERATIONS )
                .tags( tags )
                .tag( "outcome", "failure" )
                .register( registry );

        Gauge.builder( "failsafe.circuit-breaker.state", circuitBreaker, ( CircuitBreaker<?> cb ) -> {
                    if( cb.isClosed( ) ) {
                        return 0;
                    } else if( cb.isHalfOpen( ) ) {
                        return 1;
                    } else if( cb.isOpen( ) ) {
                        return 2;
                    } else {
                        return -1;
                    }
                } )
                .tags( tags )
                .register( registry );
    }
}

So far, no problem, everything is available and it's just defining some meter integrating metrics exposed by the monitored object.

No let's say I have a FailsafeExecutor with bulkhead and timeout policies. It would expect to be able to invoke something like new FailsafeExecutorMetrics(failsafeExecutor, "my-executor").bindTo(Metrics.globalRegistry()) and relevant metrics would be automatically created.

My issues are:

  1. Unlike CircuitBreaker, Timeout & Bulkhead policies expose nothing. From the policy, I cannot observe the number of executions currently running, the number of timeout that have occurred or whatever relevant to the policy.
  2. FailsafeExecutor has listeners that might be helpful to implement instrumentation. However, only one listener is supported (calling onXxx() will override the current listener) and it's not possible to access the current listener. Client code may have already have registered a listener and instrumentation should not override it.
  3. Adding a listener which is invoked on completion is unlikely be enough to observe something like running operation in a bulkhead.

How would you approach the problem?