census-instrumentation / opencensus-specs

Apache License 2.0
188 stars 50 forks source link

Specify sanitization rules for exporters #35

Open rakyll opened 6 years ago

rakyll commented 6 years ago

OpenCensus tag and stats names are not as restrict as some of the backends we are uploading the data to. In order to comply with the restriction the backends enforce, we need to sanitize the names. Sanitization rules need to be consistent across different language optimizations.

We need to write a spec that contains the sanitization rules.

/cc @bogdandrutu @acetechnologist @dinooliva @g-easy @adriancole @odeke-em

codefromthecrypt commented 6 years ago

In cases of invalid utf8 decoding, I have seen '?' substituted. Underscore might be a safer choice as more likely a backend might support it in a name, but it is whack-a-mole as who knows what accepted characters of backends are. If we are choosing underscore because it is already a character we accept, then maybe note that.

@jkschneider do you have similar defence or insight from micrometer?

jkschneider commented 6 years ago

do you have similar defence or insight from micrometer?

@adriancole The rules are different for each backend and are captured in the NamingConvention of each registry implementation. Some monitoring systems have no restrictions at all and some are rather restrictive in specific ways (e.g. Atlas queries are sent as query params on an HTTP requests so quite limiting). Rather than trying to creating a sanitization rule that satisfies all of them at once, we leave it to the naming convention of each implementation.

Separately, the NamingConvention helps to represent a name in a canonical way on each backend. This is a concern that isn't captured in this issue, but probably should be. Our users have been very particular about canonical naming. Some systems prefer camel case, others snake case, some dot separated, Prometheus prefer all counters to end with _sum, etc. So in Micrometer you start out with a dot separated name and the role of the NamingConvention is to normalize it to this canonical form.

This can be more than just an idiomatic concern. For example, if Prometheus histogram time series aren't represented with a tag key of le, histogram_quantile doesn't function. The exact same data needs to be represented with a tag key of bucket in Atlas or :percentile doesn't function. As more monitoring systems implement query-based aggregable functions like this, the importance of names will continue to grow.

rakyll commented 6 years ago

We should allow each exporter to prefer a sanitization character rather than setting a hard rule. I am editing the list above.

semistrict commented 6 years ago

(Copying my comment from #50) IMO the right thing to do is to not permit invalid view name / exporter combinations. We could either throw an error or just log a warning if they were used together. It's usually going to be the same person registering views and exporters so they will have the opportunity to fix the problem if we tell them about it.

jkschneider commented 6 years ago

@ramonza I agree in theory, but in practice per-exporter sanitization rules have not resulted in a loss of data or readability for names/tags as implemented in Micrometer. I don't think that's a concern here either.

semistrict commented 6 years ago

I guess I'm struggling to see why we should bother with it at all. The view name string is totally under control of the application developer who registers the exporter and the view and so it seems like they are in a good position to choose an appropriate name and avoid the whole complicated mess of mapping.

jkschneider commented 6 years ago

@ramonza It's about the portability of instrumentation. If you choose a different monitoring system later, you don't want to have to go back through all your code and change . to _, etc. And suppose you choose to export your data to multiple monitoring systems today. There won't necessarily be one appropriate name that satisfies both.

semistrict commented 6 years ago

@jkschneider I don't really see views as instrumentation, more like configuration. Internally at Google they are defined in text protobuf files. We've been talking about having a similar configuration system for open source. Point is they will all be in one place (or a small number of places) not scattered throughout your codebase, so easy to update if needed.

For multiple exporters, what about having the Exporter take a map of "canonical" view/tag name to "exported" view/tag name?

Also, I also forgot about tag names in my previous comment. We currently don't have a way to change the tag name in the view definition. Discussed with @bogdandrutu, we think that the best way to handle this is to have a mechanism to rename tags at view definition time. Then the application developer can select appropriate aliases for all tag names that are part of any views they define. Internal census has something called a TagSelector that configures how tags are incorporated into views that we need some equivalent of in OpenCensus and this is probably something we could add there.

rakyll commented 6 years ago

One of the main goals of this project is to allow libraries provide views ready to be subscribed without much effort. Sanitization is a good tradeoff. As a user, I don't want to deal with the specifics of metric collection backends. I just want to plug my exporter and assume OpenCensus doing the right thing.

codefromthecrypt commented 6 years ago

One of the main goals of this project is to allow libraries provide views ready to be subscribed without much effort. Sanitization is a good tradeoff. As a user, I don't want to deal with the specifics of metric collection backends. I just want to plug my exporter and assume OpenCensus doing the right thing.

sounds nice. should we try to setup (perhaps a different issue) to validate that different exporters work in practice with sanitized data? Not saying they don't, but actively testing would be a refreshing change from other efforts which can drift without anyone noticing. Having done this sort of thing before, one of the issues is test accounts on services and shared goals. While somewhat ancillary this could be helpful in thinking through how to keep the promise https://github.com/Netflix/denominator/wiki/Adding-new-Providers

jkschneider commented 6 years ago

I'm going to step back from this conversation because now we are talking about views, a concept that I don't see described anywhere in the open, except in a git commit:

image

Seems to me that there is not much of an attempt to understand/listen, just shut down the discussion with language that necessitates an understanding of Google's internal mechanism.

bogdandrutu commented 6 years ago

@jkschneider can you take a look at this https://github.com/census-instrumentation/opencensus-specs/pull/51?

Also the concept is explained (probably not very well in the following docs): https://godoc.org/go.opencensus.io/stats/view

semistrict commented 6 years ago

@jkschneider we could certainly do a better job at documenting these concepts in the abstract. Here's the relevant code: https://github.com/census-instrumentation/opencensus-java/blob/7106a4737b44f4e55be6c4e397a4cf01e64ba220/api/src/main/java/io/opencensus/stats/View.java https://github.com/census-instrumentation/opencensus-go/tree/master/stats/view

It wasn't my intention to shut down the conversation. I felt like I addressed your concerns in my previous comment. If there's something I specifically missed, please let me know. This doesn't mean that I think I'm right and you're wrong. I think there are tradeoffs here that can go either way and I'm just expressing my opinion.

codefromthecrypt commented 6 years ago

to try and broker this conversation. I think jon is saying that sanitation rules can be something difficult to do with configuration or user defined keys as the rules are complex and require multiple inputs ex.

https://github.com/micrometer-metrics/micrometer/blob/master/micrometer-core/src/main/java/io/micrometer/core/instrument/config/NamingConvention.java https://github.com/micrometer-metrics/micrometer/blob/master/implementations/micrometer-registry-prometheus/src/main/java/io/micrometer/prometheus/PrometheusNamingConvention.java

There are a ton of these.

So maybe one way to proceed the discussion is to experiment and see if it is in fact possible to avoid a naming convention type in favor of a sanitation approach. This is a specs repo, and usually specs follow practice... more practice before proceeding a fair suggestion?

jkschneider commented 6 years ago

Thanks @adriancole, I agree. I didn't initially believe in the naming convention normalization concept, but real world application has a way of being very convincing. The more monitoring systems that we examine closely, I think the more apparent the need will become.

semistrict commented 6 years ago

I think there's a major difference between Micrometer and OpenCensus that is material to this discussion.

In Micrometer, Counters, Guages, etc. (Meter implementors) are given names that are ultimately what is recorded in the backend. So when you instrument e.g. an HTTP library, you choose a name that ultimately needs to turn up in backends (Prometheus, etc.) but you don't know which backend the application developer will select: hence the need for sanitization. This is IMO totally reasonable.

OpenCensus on the the other hand has just Measures (https://godoc.org/go.opencensus.io/stats#Measure) for instrumentation. These are abstract objects that just record data points, they don't aggregate at all. They have a name but the name is only used to refer to them in Views. It's the name of the View that gets written to the backend and might need to be sanitized.

Normally, the way it works is that a library will be instrumented with Measures but it's the application developer who will actually define Views over these Measures. The application developer is also the one who knows which backends they want to use so that's what I was saying earlier about them (the application developer) having the opportunity to select an appropriate View name.

As @rakyll points out though, it's common for the instrumented library to provide some convenience Views, e.g. https://godoc.org/go.opencensus.io/plugin/ocgrpc#pkg-variables . The application developer still needs to explicitly subscribe to these views for them to be exported but they don't get to name them: the names are provided by the instrumented library. And so in this case OpenCensus has a very similar problem to Micrometer and sanitization is one approach to solve this.

Another approach would be to allow the application developer to rename the View at subscription time. Then they are again in a position to choose a name that's compatible with their chosen backend(s). I think we should permit this regardless of whether we implement sanitization rules: it is a useful feature for those who want control over View names but still want the convenience of using pre-defined views.