eclipse / microprofile-metrics

microprofile-metrics
Apache License 2.0
100 stars 66 forks source link

Delay for pulling measurable methods #576

Open oxaoo opened 4 years ago

oxaoo commented 4 years ago

Hi,

There is a necessity to collect the gauge metric for the result of remote invocation, for instance:

public class Foo {
    private final static Logger LOGGER = LoggerFactory.getLogger(Foo.class);

    @Inject
    private RemoteInvoke remoteInvoke;

    @Gauge(name = "total_entries", unit = MetricUnits.NONE, absolute = true)
    public long totalEntries() {
        LOGGER.debug("Total entries is triggered");
        //for instance the rest request to an external resource
        return this.remoteInvoke.requestTotalEntries();
    }
}

This operation costs a lot and should be triggered every 5 minutes (not more often). Based on logs produced by totalEntries() method, found out that it triggered every second. I tried to find specification about pulling measurable resources (annotated with appropriate metric types) but it was unsuccessful. As a workaround I can use following approach:

@Singleton
@Startup
public class Bar {
    private final static Logger LOGGER = LoggerFactory.getLogger(Bar.class);

    @Inject
    private RemoteInvoke remoteInvoke;

    @Resource
    private TimerService timerService;

    private long totalEntries = 0L;

    @PostConstruct
    public void init() {
        ScheduleExpression timeout = new ScheduleExpression().second("0").minute("*/5");
        this.timerService.createCalendarTimer(timeout, new TimerConfig("bar_timer", false));
        LOGGER.debug("Timer is created");
    }

    @Timeout
    public void probe() {
        //for instance the rest request to an external resource
        LOGGER.debug("Probe is triggered");
        this.totalEntries = this.remoteInvoke.requestTotalEntries();
    }

    @Gauge(name = "total_entries", unit = MetricUnits.NONE, absolute = true)
    public long totalEntries() {
        LOGGER.debug("Total entries is triggered");
        return this.totalEntries;
    }
}

But nonetheless I would like to have a solution out of the box. Please, answer my following questions:

  1. Is it possible to specify the delay or cron expression for pulling measurable methods? How?
  2. Can you provide the specification about metric aggregation/pulling mechanism, how does it work under the hood?
  3. If this mechanism (p.1) isn't supported, can it be introduced in the nearest future?

Additional information: Payara AS version is 5.194. The fish.payara.extras.payara-embedded-all version is 5.194 (if I'm correct it includes the microprofile-metrics v2.0)

Best Regards, Alex

donbourne commented 4 years ago

@oxaoo , I think you're basically looking for function like this - https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/CachedGauge.html , correct?

oxaoo commented 4 years ago

@oxaoo , I think you're basically looking for function like this - https://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/CachedGauge.html , correct?

Hi @donbourne, According to the description, something very close to it. But, unfortunately, it belongs to another library.

donbourne commented 4 years ago

But, unfortunately, it belongs to another library.

True... but when we created the original MP Metrics it was based on dropwizard metrics... we just removed a lot of the non-core function from dwm to provide something simpler.

Your proposed solution looks pretty good -- but I realize it's not as convenient as a gauge that just handles the caching for you would be.

oxaoo commented 4 years ago

But, unfortunately, it belongs to another library.

True... but when we created the original MP Metrics it was based on dropwizard metrics... we just removed a lot of the non-core function from dwm to provide something simpler.

Your proposed solution looks pretty good -- but I realize it's not as convenient as a gauge that just handles the caching for you would be.

Does it follow that it's not intended to introduce such functionality into MP Metrics? Would like to notice that it's a very common necessity to check like the liveness/readiness of jms/db connection on the application level without overloading by a large number of requests.

donbourne commented 4 years ago

Does it follow that it's not intended to introduce such functionality into MP Metrics?

Not necessarily. Just don't want to introduce it unless there's good need for it.

Would like to notice that it's a very common necessity to check like the liveness/readiness of jms/db connection on the application level without overloading by a large number of requests.

That use case actually seems more like a usecase for MP Health, no? Can you say a bit more about why you want to do that with metrics?

donbourne commented 4 years ago

@oxaoo as an aside, not directly related to what you're asking for in this issue, were you thinking you'd like to have metrics for health statuses? ie. so that you could use Prometheus and alertmanager, for example, to tap into application health? If so, we might want another issue to pursue that idea.

oxaoo commented 4 years ago

@donbourne, yes, the main idea is collecting metrics for health status and configure the alert rules in Prometheus.

donbourne commented 4 years ago

@oxaoo , I think that's a pretty good use case -- would you be interested in opening an issue over in https://github.com/eclipse/microprofile-health/issues for adding MP metrics for health checks?

donbourne commented 4 years ago

It also suggests that the thing that you may want to cache, for that particular use case, is the result of the health check, rather than the result from the Gauge. Keep in mind that Kubernetes will call your probes for liveness / readiness with a configurable delay between calls of periodSeconds (default 10 seconds). I'm not saying there aren't other cases for cached gauges -- just that gauges representing the result from health checks might be better cached by the health check if you don't want the health check to do its work too often.

donbourne commented 4 years ago

@oxaoo , something I've seen recently is that sometimes we have more than one Prometheus instance scraping from the same runtime. I'm wondering if it's better to have a caching Gauge, or a caching exporter? Benefit of a caching exporter would be that I could cache my entire response (ie. to /metrics) for a configurable period of time (eg. 10 seconds).

WDYT?