GlobeBMG / GBMG.Monitoring

GBMG monitoring and telemetry abstraction
MIT License
0 stars 0 forks source link

Methods to record custom metrics #1

Closed chrisproud closed 8 years ago

chrisproud commented 8 years ago

Agree common methods required to record custom metrics

chrisproud commented 8 years ago

Some useful links to how custom metrics are implemented by providers

msancho commented 8 years ago

The other nice tool for custom metrics gathering is StatsD (by Etsy).

https://github.com/etsy/statsd

They combine it with this storage and reporitng App

https://graphiteapp.org/

chrisproud commented 8 years ago

We need methods in the integration to log how long a method takes to run so we can start to isolate the bottleneck in the process.

I'm struggling to see how this is implemented but I'm imaging a stacked chart with a bar for each part of the workflow pipeline so we can see which bit takes the longest. Anyone seen this anywhere?

A bit like the breakdown table in newrelic.

msancho commented 8 years ago

Stack chart look like the proper way to display the total time and the times for sub processes. It's easy to see at a glance if there is one subprocess that has increased the time to be processed etc.

I guess one will do steps like in MiniProfiler to measure the total and the sub processes.

chrisproud commented 8 years ago

We need miniprofiler for windows (Azure) services.

msancho commented 8 years ago

That's an example of the implementation of StatsD for C# made by the guys of DataDog https://www.datadoghq.com/blog/statsd-for-net-dogstatsd/

chrisproud commented 8 years ago

Found this chap using mini profiler in a windows service https://kzhendev.wordpress.com/2015/05/18/measuring-application-performance-with-mini-profiler-and-splunk/

msancho commented 8 years ago

I think we're not doing anything special and that maybe we can have this metrics as inspiration

https://github.com/etsy/statsd/blob/master/docs/metric_types.md

chrisproud commented 8 years ago

@msancho, what metrics do you record in Lexology?

msancho commented 8 years ago

I think the basics, timings and counters.

Newsfeed email / home timings (with sub timings), event counting (registrations, interactions, fatal errors, etc).

P.S: Well, we don't measure it now, the metrics I am using are from GES or NewRelic

chrisproud commented 8 years ago

in azure app insights, it looks like we would use timed events with a named metric. https://azure.microsoft.com/en-us/documentation/articles/app-insights-api-custom-events-metrics/#timed

chrisproud commented 8 years ago

Azure app insighs also has the idea of an operation context to group events together. Would be nice way to group activiy executions in a single workflow together. https://azure.microsoft.com/en-us/documentation/articles/app-insights-api-custom-events-metrics/#operation-context

chrisproud commented 8 years ago

NewRelic monitors service code by configuring custom instrumentation - you need to write method names into a config file and it will watch for execution. https://docs.newrelic.com/docs/agents/net-agent/instrumentation/net-custom-transactions

Nice in that you don't have to modify code. Explains why we could never get NewRelic to monitor lex jobs.

msancho commented 8 years ago

That thing looks cool, but I would keep this as generic and platform free as possible,we can always have N implementations if we want Azure or NewRelic.

The simplest of the implementation I can imagine of is.

Track.Time("Lexology.Pages.Newsfeed.DB", 1345); Track.Time("Lexology.Pages.Newsfeed.Backend", 129); Track.Time("Lexology.Pages.Newsfeed.Render", 56); Track.Time("Lexology.Data.DB.Queries.GetCountries", 12);

image

Track.Count("Lexology.Data.DB.Queries", 1); Track.Count("Lexology.Data.Cache.Hits", 1); Track.Count("Lexology.Data.Cache.Misses", 0);

Track.Gauge("Lexology.Stats.Logins", 10); Track.Gauge("Lexology.Stats.RegisteredUsers", 290000);

chrisproud commented 8 years ago

do we want to force for format on the metric name by using multiple parameters, or maybe move this a class responsible for generating consistent metric names?

Track.Time("Lexology", "Data", "DB.Queries.GetCountries", 12);

msancho commented 8 years ago

I think we have to force metric name formatting as different implementations will have different storage strategies and if the naming is consistent will be easier to implement.

We can indeed ease this by adding common naming characteristics as parameters:

Track.Time(string product, string section, string[] path, int value).

danbrad commented 8 years ago

"product" is a very Globe-centric name

"Application" "ApplicationName"

It could also be passed as part of settings on instantiation rather than at every logging call, given that it is expected not to change given your usage.

chrisproud commented 8 years ago

should we consider using consts to avoid errors, i'm not sure the pattern but something like:

Track.Time(LexologyMetricNames.Data.Queries.GetCountries, int value)

LexologyMetricNames.Data.Queries.GetCountries resolves to a static string or struct that the time method can use to resolve the full metric string name.

danbrad commented 8 years ago

Well that simplifies this library by putting the onus back on the consumer to worry about naming conventions. I thought you were aiming to force the format of the name, but this is back to free-for all unless I'm missing something?

msancho commented 8 years ago

Agree with Dan,

Product is very Globe centric, and "Application" is far way better.

We could indeed, set up the client on initialization (or with a web.config configuration section) so you only log your SubPath as the application will remain the same.

I don't think that static resolver will help this, we have to push for a proper and clear convention for people to follow.

chrisproud commented 8 years ago

if we are going to push a clear convention, why by bother forcing the app name and section as params? Feels like we're only doing half a job.

Infact, any force on naming should probably be outside of this abstraction anyway so maybe i've taken this off topic. We should just have Track.Time(String, value) which can be consumed by whichever naming tooling the app wishes to use? As long as it complies with the stick of course.

msancho commented 8 years ago

As long it compiles with the convention, we can let every application generate the keys the way they fancy. That also will keep the interface simpler and cleaner

danbrad commented 8 years ago

I agree- naming decisions should be outside this lib.

msancho commented 8 years ago

Shall we also include an option to store Code Deploys?

I am proposing this because it's easy to interpret changes on trends in the charts if you can mark that something changed in the code base at any particular time.

Please, check this. https://codeascraft.com/2010/12/08/track-every-release/

chrisproud commented 8 years ago

Should this lib contain just the interface or can it also contain helper classes that make logging metrics easier?

Help classes we might use:

chrisproud commented 8 years ago

Yes to code deploy markers, but i'm not sure they live here. Maybe octopus should pickup variables from the web.config

msancho commented 8 years ago

Well,

Code section timer should be generic and the same all over the implementations, as they (I think) will only differ in how to send the data and how/where to store.

I am thinking as.

using(new LogTimer(MyKey))
{
     //Code here
     using(new LogTimer(MyKey2))
     {
           //Code here
     }
}

So on dispose, it will stop the watch, and send the value.

So we could have a base implementation that does the timing and have an let the implementations to override the "Send"

msancho commented 8 years ago

I will store the code deploys the same way, for me is a metric more. It's an event that effects the application and then makes the reporting consume the data from the same "storage" so we don't have to create a new reporting that mixes from 2 different sources.

chrisproud commented 8 years ago

I think we are off topic again, lets go here https://github.com/GlobeBMG/GBMG.Monitoring/issues/3

chrisproud commented 8 years ago

Moved helper method ideas here https://github.com/GlobeBMG/GBMG.Monitoring/issues/4

chrisproud commented 8 years ago

Do helper methods/classes live in a separate library to the interface. Ie do we need two core libs or just one?

GBMG.Monitoring.Interfaces GBMG.Monitoring.Helpers

or just

GBMG.Monitoring

msancho commented 8 years ago

GBMG.Monitoring I think is good enough as it's just abstractions and common (invariant) functionality.

Then we could have GBMG.Monitoring.StatsD, GBMG.Monitoring.AzureMetrics or whatever implementation we think we need. :)

chrisproud commented 8 years ago

Looking at AppInsights, there is a difference between Metrics and Events. Does stackify/statsd providers differentiate?

From https://blogs.msdn.microsoft.com/visualstudioalm/2015/02/07/metrics-explorer-custom-metrics-events-and-properties/

When to use TrackMetric() and TrackEvent()

Use TrackEvent when:

  • Each specific event is important – you can view them in Diagnostic Search. Metrics are measured with each event, so that you can segment the metrics by event type and properties of the event.
  • Users/sessions centric metrics are important for the event for e.g. if you want to know the region from which users are uploading files (enables you to make a decision about getting a processing server in that region)

Use TrackMetric when:

  • The metric is independent of any specific event.
  • You want to measure things on specific intervals (such as CPU or queue length every minute).
  • You have a high volume of metrics and would like to collect and aggregate locally before sending it. We will talk about aggregation in depth in a later blog post.
msancho commented 8 years ago

I think it's not a standard thing. but people are using it combined and not storing different. https://blog.mozilla.org/webdev/2012/04/05/tracking-deployments-in-graphite/

I understand that differentiation between events and metrics in app-insights and, Code Deploy could be understood (I think) as both. I think though, that this event it's a metric, so I would use Track.Count("Lexology.Deploy", 1). I

chrisproud commented 8 years ago

In your app, do you want to be able to specify metric or event or should the provider implementation make this decision?

chrisproud commented 8 years ago

Lets try and nail down the abstraction interface.

We are interested in:

This gives us:

interface IMetricLogger
{
   TrackEvent(string metricName, int? value);
   TrackTimedEvent(string metricName, int value); //milliseconds
}

I was going to add IncrementMetric(string metricName, int? value) but I think this is a statsd only thing.

What do people think about the interface and method names?

@aeagle, does this cover what we need for the integration? Can we include workflow and activity details in the metric name?

@msancho, doe you get everything you need here for Lex?

msancho commented 8 years ago

Well, counter increments are quite important. For example, if you want to track queries, cache misses, cache hits etc.

chrisproud commented 8 years ago

interesting, Microsoft have a demon that fwds statsd metrics to appinsights. this actually supports gauges. https://github.com/Microsoft/ApplicationInsights-statsd

chrisproud commented 8 years ago

How would you use increments for cache misses etc? aren't they just events?

msancho commented 8 years ago

IMHO it's a counter, if you have the flush time to minutes, you increment Track.Count("Lexology.Data.Cache.Misses", 1) so the chart will show all the misses per minute.

That's the problem between events and metrics, is that thin layer or subjective view between them... at the end all of them are metrics...

chrisproud commented 8 years ago

Aha, this is how you do them with app insights. https://github.com/Microsoft/ApplicationInsights-SDK-Labs/tree/master/AggregateMetrics

chrisproud commented 8 years ago

ok, so we now have this which should be supported (ish) by statsd, appinsighs and stackify

interface IMetricLogger { TrackEvent(string metricName, int? value); TrackTimedEvent(string metricName, int value); //milliseconds IncrementMetric(string metricName, int? value); }

Anything else we need?

msancho commented 8 years ago

I don't know but TrackEvent and IncrementMetric look redundant for me.

msancho commented 8 years ago

This could be as an inspiration for the events

https://developers.google.com/analytics/devguides/collection/analyticsjs/events

chrisproud commented 8 years ago

Increment metric is for the gauges you want that are aggregated and flushed every minute.

Track event is to log an event as occurred when you might want granularity like code updated or user logon. There is an option value you can store here.

TimedEvent is for timing code but could log anything you want to time the duration.

I'm trying to keep the implementation as agnostic as I can while balancing against the richer functionality we get in stackify and app insights.

Make more sense? Can you suggest something better that meets the requirments?

msancho commented 8 years ago

Ok, Just to clarify, a gauge sends the last value if nothing else has been added in the flush time, but a counter it sets back to 0 every time it gets flushed.

There are events that you may want to track as Counters and there are events that you want to count as Gauges.

For example. I will use a counter to count track logins (as I am interested in loggins per minute/hours) and use a gauge to count registrations, as I am interested to see the evolution overtime.

In these links there are a good explanations on how/where use each metric and I am inspiring in this. At the end, we should put a list of what we want to measure and try to cluster them, and then, from there trying to sort out the best interface instead of defining an interface before and see that it doesn't fits our use.

http://thenewstack.io/collecting-metrics-using-statsd-a-standard-for-real-time-monitoring/ http://statsd.readthedocs.io/en/v3.1/types.html

Things I want to track from lexology.

Timings for page loads, login times, data generation/ processing etc. Data Related counters (DB Accesses, Cache Hits/Misses) Server Values (CPU, RAM, IO)

Then there is the whole business intelligence measures, track registrations, usage, interactions with content, clicks etc.

So we can generate dashboards like this

chrisproud commented 8 years ago

In addition to your list we need execution duration for each activity in each workflow and odata request times.

What would your interface abstraction look like?

msancho commented 8 years ago

@danbrad any other stuff you want to measure?

msancho commented 8 years ago

Track.Count("KeyValue", increment); Track.Time("KeyValue", milliseconds); Track.Measure("KeyValue", value);

chrisproud commented 8 years ago

I would prefer more descriptive methods that make the code easier to read, for example count is ambiguous when you are reading the code - how would I know that increments a counter?

interface IMetricLogger { void IncrementMetric(string metricName, int increment); void LogTimedEvent(string metricName, int milliseconds); void MeasureMetric(string metricName, int value); }