apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.12k stars 3.57k forks source link

Micrometer Observation Support #20845

Open marcingrzejszczak opened 1 year ago

marcingrzejszczak commented 1 year ago

Search before asking

Motivation

I'm a co-maintainer of Spring Cloud Sleuth and Micrometer projects (together with @shakuzen and @jonatan-ivanov).

Micrometer Observation is part of the Micrometer 1.10 release and Micrometer Tracing is a new project. The idea of Micrometer Observation is that you instrument code once but you get multiple benefits out of it - e.g. you can get tracing, metrics, logging or whatever you see fit).

I was curious if there's interest in adding Micrometer Observation support so that except for metrics, spans could be created and tracing context propagation could happen too. Via Micrometer Tracing one can use OpenTelemetry or OpenZipkin Brave Tracer, but with the handler mechanism the possibilities are endless :)

Solution

If there's such interest we could provide a PR to add support for observability with Micrometer Observation.

Alternatives

There's none as far as I see

Anything else?

No response

Are you willing to submit a PR?

tisonkun commented 1 year ago

cc @asafm @michaeljmarshall @BewareMyPower may you have inputs in this topic?

asafm commented 1 year ago

Hi @marcingrzejszczak. I'm working on revamping metrics for Pulsar. I'm currently at stage of discussions on my PIP: https://github.com/apache/pulsar/issues/20197.

I can see the value of Micrometer for application developers, but less for Pulsar brokers and other components. You can see why the PIP chose OpenTelemetry and read what we saw as issues with other libraries under "Why not other libraries?" section in the PIP. With that in mind, I think it's less relevant for Pulsar adding the Observability module to use inside Pulsar brokers.

github-actions[bot] commented 1 year ago

The issue had no activity for 30 days, mark with Stale label.

marcingrzejszczak commented 9 months ago

Hi @asafm . Sorry for the gigantic delay in answering to your comment. I can't find the "Why not other libraries?" section in the PIP. Can you point me to that point again, please? Thank you.

marcingrzejszczak commented 9 months ago

I found it!

Thanks for creating such an extensive document! I will address parts of it below.

In this epic scope we will only use the Metrics API and SDK. Specifically we will use the Java SDK in Pulsar Broker, Proxy and Function Worker, but also the Go and Python SDK for the wrapper code which executes functions written in Go and Python.

With Micrometer Observation you can use 1 Java API to instrument for whatever observability pillars (metrics, traces etc.). With OTel you will need to use as many APIs as necessary to achieve the same goal.

One very big breaking change (there are several described in the High Level Design, and also summarized in the Backward Compatibility section) is the naming. We are changing all metric names due to several reasons: Attributes names (a.k.a. Labels) will utilize the Semantic Conventions defined in OTel. OpenTelemetry defined an agreed upon attribute names for many attributes in the industry.

The status of semantic conventions in OTel is still mixed or experimental for many of its parts. In Micrometer, through the Micrometer Observation Convention mechanism you can gradually migrate from certain naming conventions to other ones. You can do the same just for metrics (through MeterFilter), just for tracing (through SpanFilter) or for all signals (via ObservationConvention or ObservationFilter).

OTel doesn’t force you to supply documentation. We will create a static code analysis rule failing the build if it finds instrument creation without description. We will optionally try to somehow create an automated way to export all metrics metadata - instrument name, documentation, in an easy to read format to be used for documentation purposes on Pulsar website.

Micrometer Docs Generator already provides this feature. You can have your code scanned and produce documentation out of it. Example for Spring Cloud Task.

It’s the new emerging industry-wide standard for observability and specific metrics, as opposed to just a library or a standard adopted and promoted by a single entity/company.

I would be cautious about statements like that following the history of OpenTracing (e.g. Couchbase went all in on OT and now they created their own facade).

With Micrometer, Micrometer Observation and Micrometer Tracing you have facades created in such a way that the user can define which “standard” they want to use and when there’s the “next standard” they will be able to migrate to it with minimal code changes.

Micrometer had the vision of becoming the industry standard API like SLF4J is for logging in the Java ecosystem. In reality, it didn’t catch on, as can be seen in the Maven Central statistics: It’s used by ~1000 artifacts, compared to sl4fj-api which is used by 60k artifacts—as such, picking it as the standard for today, seems like “betting” on the wrong project.

It sounds a little bit like comparing incomparable libraries. You can’t compare a logging library to a metrics library. We were stating that we are like Slf4j but for observability. We never claimed that we will be Slf4j nor that we will reach their download statistics. I’m pretty sure that vast majority of applications are doing logging. Not all of the applications are gathering observability data.

BTW if you compare the same statistics that you mentioned between micrometer-core library and e.g. opentelemetry-api then you’ll get around 1 800 for Micrometer Core (1.12.0) and 220 for opentelemetry-api (1.32.0).

Micrometer architecture relies heavily on the library to implement all target systems like Datadog, Prometheus, Graphite, OTLP, and more. OTel relies on the collector to implement that as it has more power and can contain the state if one of those systems goes down for some time. I think it’s a smarter choice, and more vendors will likely appear and maintain their exporter in OTel collector as we advance. This makes it easier for operators to have one exporter code base (say to Cortex) across different languages of micro-services, so it makes sense people will lean towards this framework and request it soon.

Micrometer has an OTLP MeterRegistry so it can export metrics in the OTLP format. You can check the code here.

OTel was built with instrumentation scope in mind, which gives a sort of namespace per library or section of the code (Called Meter in the API). For Pulsar, it can be used to have one per plugin. Micrometer doesn’t have that notion. It’s great especially if Pulsar and another plugin are using same library (e.g. Caffeine for caching), thus in Prometheus or other libraries the metrics will override each other, but in OTel the meter provides an attribute for name and version, thus provide a namespace.

I don’t understand this point. Can you please elaborate?

OTel by design has an instrument that you report measurements for a given attribute set, meaning it has that design of instrument = map(attributes→values). In Micrometer, it’s designed in a way that each (instrument, attributes) is a metric on its own. Less elegant and more confusing.

I don’t understand this point. Can you please elaborate?

asafm commented 8 months ago

Thanks for your comments. Before I continue, I must note that the PIP has been approved I think in September concluding almost 1.5 years researching it, and also lots of efforts into improving OTel Java SDK to match the harsh memory allocation requirements (almost zero) of Pulsar codebase. The latter is still in progress. So I don't see the community reverting that decision unless something so big will disrupt that decision. I personally don't see Micrometer Observation being that. You are free to raise this in the DEV mailing list of Apache Pulsar of course.

With Micrometer Observation you can use 1 Java API to instrument for whatever observability pillars (metrics, traces etc.). With OTel you will need to use as many APIs as necessary to achieve the same goal.

I don't see in that API how do I create an histogram or a counter. Can you explain?

It sounds a little bit like comparing incomparable libraries. You can’t compare a logging library to a metrics library. We were stating that we are like Slf4j but for observability. We never claimed that we will be Slf4j nor that we will reach their download statistics. I’m pretty sure that vast majority of applications are doing logging. Not all of the applications are gathering observability data.

Slf4j made one API hence most adopted, for logging. It will be wonderful to have same for metrics, in the industry. IMO, Micrometer attempted to be that for metrics but didn't caught on, popularity wise. OTel tries that now, and as it still in early stages, the amount of companies involved, including outside Java ecosystem, IMO is a clear indicator it will be such. Only time will tell.

OTel was built with instrumentation scope in mind, which gives a sort of namespace per library or section of the code (Called Meter in the API). For Pulsar, it can be used to have one per plugin. Micrometer doesn’t have that notion. It’s great especially if Pulsar and another plugin are using same library (e.g. Caffeine for caching), thus in Prometheus or other libraries the metrics will override each other, but in OTel the meter provides an attribute for name and version, thus provide a namespace.

In OTel, when creating a Counter for example do it under a Instrumentation Scope, which has a name an a version. So if different plugins use same metric name, it will collide. In OTel each counter created under it own scope. Something like otel_scope_name=plugin1. Thus it will not clash.

OTel by design has an instrument that you report measurements for a given attribute set, meaning it has that design of instrument = map(attributes→values). In Micrometer, it’s designed in a way that each (instrument, attributes) is a metric on its own. Less elegant and more confusing.

In OTel the data structure is something like:

instrument {
  name
  attributesStorage: Map<Attributes, Storage>
}

Storage is per type. for Count is may end up be:

{
    sum: LongAdder
}

From my understanding, in MicroMeter every attributes in an instrument is standing on its own.