Closed chrisproud closed 8 years ago
Some useful links to how custom metrics are implemented by providers
The other nice tool for custom metrics gathering is StatsD (by Etsy).
https://github.com/etsy/statsd
They combine it with this storage and reporitng App
We need methods in the integration to log how long a method takes to run so we can start to isolate the bottleneck in the process.
I'm struggling to see how this is implemented but I'm imaging a stacked chart with a bar for each part of the workflow pipeline so we can see which bit takes the longest. Anyone seen this anywhere?
A bit like the breakdown table in newrelic.
Stack chart look like the proper way to display the total time and the times for sub processes. It's easy to see at a glance if there is one subprocess that has increased the time to be processed etc.
I guess one will do steps like in MiniProfiler to measure the total and the sub processes.
We need miniprofiler for windows (Azure) services.
That's an example of the implementation of StatsD for C# made by the guys of DataDog https://www.datadoghq.com/blog/statsd-for-net-dogstatsd/
Found this chap using mini profiler in a windows service https://kzhendev.wordpress.com/2015/05/18/measuring-application-performance-with-mini-profiler-and-splunk/
I think we're not doing anything special and that maybe we can have this metrics as inspiration
https://github.com/etsy/statsd/blob/master/docs/metric_types.md
@msancho, what metrics do you record in Lexology?
I think the basics, timings and counters.
Newsfeed email / home timings (with sub timings), event counting (registrations, interactions, fatal errors, etc).
P.S: Well, we don't measure it now, the metrics I am using are from GES or NewRelic
in azure app insights, it looks like we would use timed events with a named metric. https://azure.microsoft.com/en-us/documentation/articles/app-insights-api-custom-events-metrics/#timed
Azure app insighs also has the idea of an operation context to group events together. Would be nice way to group activiy executions in a single workflow together. https://azure.microsoft.com/en-us/documentation/articles/app-insights-api-custom-events-metrics/#operation-context
NewRelic monitors service code by configuring custom instrumentation - you need to write method names into a config file and it will watch for execution. https://docs.newrelic.com/docs/agents/net-agent/instrumentation/net-custom-transactions
Nice in that you don't have to modify code. Explains why we could never get NewRelic to monitor lex jobs.
That thing looks cool, but I would keep this as generic and platform free as possible,we can always have N implementations if we want Azure or NewRelic.
The simplest of the implementation I can imagine of is.
Track.Time("Lexology.Pages.Newsfeed.DB", 1345); Track.Time("Lexology.Pages.Newsfeed.Backend", 129); Track.Time("Lexology.Pages.Newsfeed.Render", 56); Track.Time("Lexology.Data.DB.Queries.GetCountries", 12);
Track.Count("Lexology.Data.DB.Queries", 1); Track.Count("Lexology.Data.Cache.Hits", 1); Track.Count("Lexology.Data.Cache.Misses", 0);
Track.Gauge("Lexology.Stats.Logins", 10); Track.Gauge("Lexology.Stats.RegisteredUsers", 290000);
do we want to force for format on the metric name by using multiple parameters, or maybe move this a class responsible for generating consistent metric names?
Track.Time("Lexology", "Data", "DB.Queries.GetCountries", 12);
I think we have to force metric name formatting as different implementations will have different storage strategies and if the naming is consistent will be easier to implement.
We can indeed ease this by adding common naming characteristics as parameters:
Track.Time(string product, string section, string[] path, int value).
"product" is a very Globe-centric name
"Application" "ApplicationName"
It could also be passed as part of settings on instantiation rather than at every logging call, given that it is expected not to change given your usage.
should we consider using consts to avoid errors, i'm not sure the pattern but something like:
Track.Time(LexologyMetricNames.Data.Queries.GetCountries, int value)
LexologyMetricNames.Data.Queries.GetCountries resolves to a static string or struct that the time method can use to resolve the full metric string name.
Well that simplifies this library by putting the onus back on the consumer to worry about naming conventions. I thought you were aiming to force the format of the name, but this is back to free-for all unless I'm missing something?
Agree with Dan,
Product is very Globe centric, and "Application" is far way better.
We could indeed, set up the client on initialization (or with a web.config configuration section) so you only log your SubPath as the application will remain the same.
I don't think that static resolver will help this, we have to push for a proper and clear convention for people to follow.
if we are going to push a clear convention, why by bother forcing the app name and section as params? Feels like we're only doing half a job.
Infact, any force on naming should probably be outside of this abstraction anyway so maybe i've taken this off topic. We should just have Track.Time(String, value) which can be consumed by whichever naming tooling the app wishes to use? As long as it complies with the stick of course.
As long it compiles with the convention, we can let every application generate the keys the way they fancy. That also will keep the interface simpler and cleaner
I agree- naming decisions should be outside this lib.
Shall we also include an option to store Code Deploys?
I am proposing this because it's easy to interpret changes on trends in the charts if you can mark that something changed in the code base at any particular time.
Please, check this. https://codeascraft.com/2010/12/08/track-every-release/
Should this lib contain just the interface or can it also contain helper classes that make logging metrics easier?
Help classes we might use:
Yes to code deploy markers, but i'm not sure they live here. Maybe octopus should pickup variables from the web.config
Well,
Code section timer should be generic and the same all over the implementations, as they (I think) will only differ in how to send the data and how/where to store.
I am thinking as.
using(new LogTimer(MyKey))
{
//Code here
using(new LogTimer(MyKey2))
{
//Code here
}
}
So on dispose, it will stop the watch, and send the value.
So we could have a base implementation that does the timing and have an let the implementations to override the "Send"
I will store the code deploys the same way, for me is a metric more. It's an event that effects the application and then makes the reporting consume the data from the same "storage" so we don't have to create a new reporting that mixes from 2 different sources.
I think we are off topic again, lets go here https://github.com/GlobeBMG/GBMG.Monitoring/issues/3
Moved helper method ideas here https://github.com/GlobeBMG/GBMG.Monitoring/issues/4
Do helper methods/classes live in a separate library to the interface. Ie do we need two core libs or just one?
GBMG.Monitoring.Interfaces GBMG.Monitoring.Helpers
or just
GBMG.Monitoring
GBMG.Monitoring I think is good enough as it's just abstractions and common (invariant) functionality.
Then we could have GBMG.Monitoring.StatsD, GBMG.Monitoring.AzureMetrics or whatever implementation we think we need. :)
Looking at AppInsights, there is a difference between Metrics and Events. Does stackify/statsd providers differentiate?
When to use TrackMetric() and TrackEvent()
Use TrackEvent when:
- Each specific event is important – you can view them in Diagnostic Search. Metrics are measured with each event, so that you can segment the metrics by event type and properties of the event.
- Users/sessions centric metrics are important for the event for e.g. if you want to know the region from which users are uploading files (enables you to make a decision about getting a processing server in that region)
Use TrackMetric when:
- The metric is independent of any specific event.
- You want to measure things on specific intervals (such as CPU or queue length every minute).
- You have a high volume of metrics and would like to collect and aggregate locally before sending it. We will talk about aggregation in depth in a later blog post.
I think it's not a standard thing. but people are using it combined and not storing different. https://blog.mozilla.org/webdev/2012/04/05/tracking-deployments-in-graphite/
I understand that differentiation between events and metrics in app-insights and, Code Deploy could be understood (I think) as both. I think though, that this event it's a metric, so I would use Track.Count("Lexology.Deploy", 1). I
In your app, do you want to be able to specify metric or event or should the provider implementation make this decision?
Lets try and nail down the abstraction interface.
We are interested in:
This gives us:
interface IMetricLogger
{
TrackEvent(string metricName, int? value);
TrackTimedEvent(string metricName, int value); //milliseconds
}
I was going to add IncrementMetric(string metricName, int? value) but I think this is a statsd only thing.
What do people think about the interface and method names?
@aeagle, does this cover what we need for the integration? Can we include workflow and activity details in the metric name?
@msancho, doe you get everything you need here for Lex?
Well, counter increments are quite important. For example, if you want to track queries, cache misses, cache hits etc.
interesting, Microsoft have a demon that fwds statsd metrics to appinsights. this actually supports gauges. https://github.com/Microsoft/ApplicationInsights-statsd
How would you use increments for cache misses etc? aren't they just events?
IMHO it's a counter, if you have the flush time to minutes, you increment Track.Count("Lexology.Data.Cache.Misses", 1) so the chart will show all the misses per minute.
That's the problem between events and metrics, is that thin layer or subjective view between them... at the end all of them are metrics...
Aha, this is how you do them with app insights. https://github.com/Microsoft/ApplicationInsights-SDK-Labs/tree/master/AggregateMetrics
ok, so we now have this which should be supported (ish) by statsd, appinsighs and stackify
interface IMetricLogger { TrackEvent(string metricName, int? value); TrackTimedEvent(string metricName, int value); //milliseconds IncrementMetric(string metricName, int? value); }
Anything else we need?
I don't know but TrackEvent and IncrementMetric look redundant for me.
This could be as an inspiration for the events
https://developers.google.com/analytics/devguides/collection/analyticsjs/events
Increment metric is for the gauges you want that are aggregated and flushed every minute.
Track event is to log an event as occurred when you might want granularity like code updated or user logon. There is an option value you can store here.
TimedEvent is for timing code but could log anything you want to time the duration.
I'm trying to keep the implementation as agnostic as I can while balancing against the richer functionality we get in stackify and app insights.
Make more sense? Can you suggest something better that meets the requirments?
Ok, Just to clarify, a gauge sends the last value if nothing else has been added in the flush time, but a counter it sets back to 0 every time it gets flushed.
There are events that you may want to track as Counters and there are events that you want to count as Gauges.
For example. I will use a counter to count track logins (as I am interested in loggins per minute/hours) and use a gauge to count registrations, as I am interested to see the evolution overtime.
In these links there are a good explanations on how/where use each metric and I am inspiring in this. At the end, we should put a list of what we want to measure and try to cluster them, and then, from there trying to sort out the best interface instead of defining an interface before and see that it doesn't fits our use.
http://thenewstack.io/collecting-metrics-using-statsd-a-standard-for-real-time-monitoring/ http://statsd.readthedocs.io/en/v3.1/types.html
Things I want to track from lexology.
Timings for page loads, login times, data generation/ processing etc. Data Related counters (DB Accesses, Cache Hits/Misses) Server Values (CPU, RAM, IO)
Then there is the whole business intelligence measures, track registrations, usage, interactions with content, clicks etc.
So we can generate dashboards like this
In addition to your list we need execution duration for each activity in each workflow and odata request times.
What would your interface abstraction look like?
@danbrad any other stuff you want to measure?
Track.Count("KeyValue", increment); Track.Time("KeyValue", milliseconds); Track.Measure("KeyValue", value);
I would prefer more descriptive methods that make the code easier to read, for example count is ambiguous when you are reading the code - how would I know that increments a counter?
interface IMetricLogger { void IncrementMetric(string metricName, int increment); void LogTimedEvent(string metricName, int milliseconds); void MeasureMetric(string metricName, int value); }
Agree common methods required to record custom metrics