cloudfoundry-community / stackdriver-tools

Stackdriver Nozzle for Cloud Foundry Loggregator, Host Monitoring Agents BOSH Release
Apache License 2.0
21 stars 13 forks source link

Use native Stackdriver Monitoring Protos #142

Open johnsonj opened 6 years ago

johnsonj commented 6 years ago

see discussion in #137. Use native protos at the point of translation instead of ushering them into an intermediary format.

fluffle commented 6 years ago

I started looking at this in more detail, because of @knyar 's comments on #156. The separation of Metric from MetricDescriptor in the API protos complicates matters somewhat, because we must derive the descriptors from firehose data.

I think (based on an hour or so of poking around, so don't take this as gospel) that the correct route forward is to respect this separation. Handwaving proto-design-doc discussion follows!

We should create a stackdriver.MetricRegistry (or DescriptorRegistry, or whatever) to encapsulate the creation and management of MetricDescriptors. This:

The Metric Type is the shared key tying a Metric to its Descriptor in the API; we can use that in the MetricRegistry for deduping and caching as is done in the MetricAdapter today. I don't think anything more than:

type MetricRegistry interface {
    // Checks descriptor.Type against internal cache
    // Returns error if cached and descriptors don't match
    // Creates if not cached
    // Adds to cache if successful
    // Returns error on failure
    // May not even need created bool, but probably useful for testing.
    Register(descriptor *metricpb.MetricDescriptor) (created bool, err error)
}

is needed, but I have not done enough due diligence to be 100% sure. An alternative API might create separate Register functions per MetricKind or maybe MetricKind+ValueType and take proto elements as args instead:

type MetricRegistry interface {
    RegisterGauge(metricType string, valueType MetricDescriptor_ValueType, labels []*LabelDescriptor,  otherStuff ...string) (created bool, err error)
    // ... or ...
    RegisterInt64Gauge(metricType string, labels []*LabelDescriptor, otherStuff ...string) (created bool, err error)
    // ... etc, etc.
}

I'm leaning towards the former since it allows more flexibility and is smaller, though it does push ~all of the creation code down into the metric sink. We could expose package-level functions outside of the interface to make some of the proto translation less of a chore, instead of having them in the registry interface.

The registry should populate its cache with all the currently-configured custom metric descriptors at nozzle startup so we can do client-side diffing, provide more meaningful errors, and possibly even forcibly override descriptors.

The nozzle's metric sink can derive both a metricpb.MetricDescriptor and a monitoringpb.TimeSeries from the firehose event, call metricRegistry.Register() with the descriptor, then pass the timeseries off to the metricAdapter for deduping / batching / writing to SD. The metric pipeline can dedupe using the metricpb.Metric proto from the timeseries, since that contains exactly the name and labelset we want to dedupe on.

Creating the TimeSeries in the metric sink also allows the possibility of creating MonitoredResource data for e.g. vm instances or container instances, if we can derive them in a consistent fashion.

Again, this is all very much handwaving and I'd love to hear your opinions on it!

Edit: Forgot to close my code tags :-O

fluffle commented 6 years ago

The logging side of this change looks less controversial. Creating the Entry in the log sink allows us to potentially put more detail into the logs: Entry has an HttpRequest field which we could (partially) fill out from HttpStartStop events, for example. There's also the potential of filling out some of the Operation and Resource structs.

One key point is that we have to break the top level dictum that "a firehose event is either a metric or a log, it can't be both", since we already know from the MetricAdapter side channel that that is not true.

johnsonj commented 6 years ago

I'm all for the MetricRegistry and think the single Register method is the way to go.

From my point of view the MetricAdapter is responsible for translating between any internal representations and the final Stackdriver representation so it's appropriate for it to create a monitoringpb.MetricDescriptor. It would be reasonable to extract the specific Metric/MetricEvent -> MetricDescriptor translation into something like:

type MetricDescriptorAdapter interface {
    // Translate provides a monitoringpb.MetricDescriptor for a given messages.Metric.
    // The MetricDescriptor is unique for the name and labels on a given Metric but does not
    // depend on the specific value. 
    Translate(messages.Metric) (monitoringpb.MetricDescriptor, error)
}

That way we can encapsulate caching and whatnot. I believe it's what you're eluding to as the package level function for translation.


Now breaking "a firehose event is either a metric or a log". I do think we still need an intermediary form from Firehose to Stackdriver given the different representations in Logging/Monitoring. But maybe we can drop messages.Log and have a single struct that represents both, goes through the exact same pipeline, and relies on MetricRouter to send it to the appropriate endpoint.

fluffle commented 6 years ago

The whole point of me recommending this change was that there should not be any internal representations ;-)

As I said in #137, the translation to Stackdriver representation should happen at the point of entry to the nozzle, because the data model Stackdriver uses is a known-good way of representing timeseries, metrics and logs, and it's what you have to use eventually anyway.

You need to derive two things from a given events.Envelope in the MetricSink: (1) a MetricDescriptor, and (2) a TimeSeries (with a Metric inside it). If you want to keep MetricAdapter around and have the functions that perform these transforms in there rather than in MetricSink, I think that could work.

Stealing @knyar 's diagram for the current state:

before

Here's the end state I am proposing this refactor aim towards.

                                                 +----------------+                                      +---------------------+
                                                 |                |                                      |                     |
                                                 | Log Adapter    |                                      | Stackdriver Logging |
                                                 |                |                                      |                     |
                                                 +-^--+-----------+                                      +-------^-------------+
                                                   |  |                                                          |
                                                   |  |                                                          |
                    +-----------------------+    +-+--v--------+                                               +-+-------------+
                    |                       |    |             |                                               |               |
              +-----> Filter Sink (Logs)    +----> Log Sink    +-----------------------------------------------> Log Client    |
              |     |                       |    |             |                                               |               |
              |     +-----------------------+    +-------------+                                               +---------------+
+----------+  |
|          |  |
|  Nozzle  +--+
|          |  |
+----------+  |
              |     +-----------------------+    +-------------+    +---------------+    +----------------+    +---------------+
              |     |                       |    |             |    |               |    |                |    |               |
              +-----> Filter Sink (Metrics) +----> Metric Sink +----> Metric Buffer +----> Metric Batcher +----> Metric Client |
                    |                       |    |             |    |               |    |                |    |               |
                    +-----------------------+    +-+--^------+-+    +---------------+    +----------------+    +-+-------------+
                                                   |  |      |                                                   |
                                                   |  |      |                                                   |
                                       +-----------v--+-+  +-v-------------------+                       +-------v-------------+
                                       |                |  |                     |                       |                     |
                                       | Metric Adapter |  | Descriptor Registry +-----------------------> Stackdriver Metrics |
                                       |                |  |                     |                       |                     |
                                       +----------------+  +---------------------+                       +---------------------+

(thanks, http://asciiflow.com)