Graph/dash parameterization

bitprophet commented 11 years ago

The problem

Use case is to "plug in" external values when generating graphs or dashboards, instead of only working with 100% literal/static metric paths.

For example, say you have a cluster of servers s01, s02, s03 and s04 and you want all of their load averages in one graph.

Currently one must load up the Metrics screen, filter for the metric(s) you want, and add to the in-progress graph until you're happy. So in the above example, to build a graph of your cluster's 5-minute load average, you would:

Filter for servers.s01.loadavg, click on servers.s01.loadavg.05
Filter for servers.s02.loadavg, click on servers.s02.loadavg.05
Filter for servers.s03.loadavg, click on servers.s03.loadavg.05
Filter for servers.s04.loadavg, click on servers.s04.loadavg.05
Save the graph.

Scales real poorly for nontrivial cluster sizes, is annoying at best for trivial ones.

Alternately:

Filter for servers.s01.loadavg, click on servers.s01.loadavg.05
Save the graph, then view it.
Edit the graph's targets list, replacing s01 with e.g. s0[1-4] or {s01,s02,s03,s04}.
Save the graph.

Scales better for individual clusters, but requires its own annoying song and dance & thus doesn't scale when you need to manage many clusters.

Possible solutions

Subject metric paths to a templating solution. E.g. define a graph as servers.$cluster01.loadavg.05, the parse step looks in a config file or talks to an API, and expands $cluster01 into so[1-4].
- Reasonably straightforward
- Requires some sort of UI change to account for "metrics" no longer being 1:1 maps to real values in Graphite, and how that affects the graph builder.
  - Perhaps an option to build graphs from hand-entered metric paths (like in Graphite's own composer) which can then be parse/expansion aware. This would be a useful mode even without any expansion implemented, really.
- Not the most elegant thing ever
- If API driven (vs config-managed conf file) might want to piggyback on existing caching to avoid querying the external API constantly
Create real models for Clusters/Services/Hosts etc, and have a "node-based" (in the Graphite sense) metric path concept. E.g. servers.<server>.loadavg.05, then Descartes loads up that path and shows you the graphs with all existing Cluster or Host values plugged in.
- Similar approach to previous, but more organized
- Allows for navigation concepts like "I want to view CPU for (all my hosts|just hosts in my prod env|just hosts in Cassandra Cluster 2|etc)", drill down, etc.
  - Technically we could apply that to the first option too, you'd just be choosing from the arbitraryish list of expandos.
- Problem: starts pulling organization from your real truth database into Descartes' DB schema. The deeper you build out these concepts in Descartes, the higher chance for conflict with how your existing systems organize the same info.
- Problem: by storing the external truth DB's info in Postgres, you open the door to sync problems you wouldn't have if simply caching "dumb" X=Y expansion maps.
Have "dummy" model objects which are transient and wrap the remote DB's data. I.e. from a UI perspective they appear to be useful objects in certain classes, but from a data storage perspective it's all API driven (again possibly with caching).
- This is another mutation on its predecessor
- Might be best of both worlds - give some form to the expansions that are going on, but don't try to keep a persistent copy of the data.

bitprophet commented 11 years ago

Talking to truth DB

Config option for how to talk to truth DB, e.g. TRUTH_DB_URI=http://clusto.prod:9998
Additional options for endpoints, like TRUTH_DB_HOSTNAME_LIST=/ua/search/prod.json
- Would need to be config managed to match installation's environment
Maybe roll this up into a plugin-y thing, class API with methods like servers(), services() etc
- How to appropriately bridge our specific Clusto organization with something generic and Descartes-friendly?
Need to know how to go from result spat out by remote end (in our cast, specific JSON lists) to something useful
- Would Descartes need/want anything more than just hostnames? Probably not.

Storage/organization on Descartes side

Something similar to (subclass of?) Metric but with the additional info about which nodes in the metric path are parameterized
Parameterization doesn't REQUIRE truth DB - could e.g. just set up a template-y path thing you might want to manually/arbitrarily plug hostnames into.
- E.g. the method of transforming $SOME_PATH's counter into a one minute rate. You might set it up as scaleToSeconds(nonNegativeDerivative($METRIC),1) and then can toss random shit into it - such as regular Metric objects.
- OTOH this basically generalizes to "just define new functions in Graphite itself"
- Aside from use cases where you do want to manipulate the path, e.g. our common "you do not need to give .Count". That could technically be done in a Graphite function too, but violates the normal spirit of "we work on the metric you give us".
Having it be its own type brings us back to "are we storing this stuff locally or not?" - especially the 'second kind' of storing function-y bits
So, one field needs to be this templated/node-oriented path value.
And a name of some kind
Needs a method to perform the templating/interpolation
- See nothing in stdlib besides ERb which is too heavy
- Ruby sprintf lacks Python "%(foo)s interpolation
- Could just rely on regular Ruby style #{foo}?
- Otherwise it's regex tiem unless I search for 3rd party shit
That method could/should yield a "real" Metric object? since it's now resolvable in Graphite
- Except it's likely to be more than one line, thus not displayable in the simple form - so maybe a Graph instead
In fact, this should really, probably be more of a "easy way to create a Graph" setup...
- But needs to be easier than the by-hand method of "create Graph from Metric, then twiddle"
- Brings us back to "just do expandos in paths, don't model anything"
- Unless we want to do graph-explorer type shit and discover patterns in the metrics (i.e. where a user will obviously want to use expandos)

bitprophet commented 11 years ago

Digging into specifics:

The overall app is very split up between frontend/backend:
- Ruby/Haml lay out the HTML and transmit info, such as the env var settings, into JS
- the Ruby app also has JSON views for most objects, and the JS frequently calls out to the current URL with JSON datatype to load the actual data
- Graphite rendering and such is done entirely (?) at the JS layer
The SQL DB seems to store both the raw URL and its JSON equivalent ("configuration"), unclear exactly why.
- Feels like the app manipulates 'url', then dumps it into 'configuration' pre-save?

Need to figure out:

Whether modifying the graph targets within the modal popup, with an interpolated version, will get preserved everywhere correctly
Whether updating the JS-level graph URL building function to perform interpolation will take effect everywhere or if there's more besides
- Actually - think I want to modify the JS Graph object's targets method since that seems to be what's used in the URL builder? Maybe use a wrapper because targets is used elsewhere too I think, like for saving back to the webapp.
- EDIT: nope, constructGraphUrl actually expects a slightly different object type/hash (yay? sigh) and this is constructed ad-hoc in a bunch of spots, not always coming from a Graph either. So I need to modify constructGraphUrl itself. Nbd.

bitprophet commented 11 years ago

Have basic interpolation working \o/ and it does indeed seem to work everywhere.

Now running into:

As one edits targets on a graph detail view, it's trying to update every keypress which gets stuttery.
Think it's setting 'always show legend' (and this is conflicting with the default graph height) because when I updated my target to be our combo of derivative and scaleToSeconds, the legend got big and now the graph itself is like 3 pixels tall. Oops!

bitprophet commented 11 years ago

Also don't see a way to set aliases easily via Descartes, which is another reason the legend is so enormous for my above graph. Torn between it being a Descartes responsibility and being something I should stuff into Graphite (so now this mythical function combines nonNegativeDerivative, scaleToSeconds and aliasByNode).

bitprophet commented 10 years ago

Revisiting this. Outstanding issues:

Needs to be easier to actually set these interpolation bits, currently one has to go to Metrics, create graph, edit graph, edit targets, modify that textarea (which also live-updates slowly, as above), save.
That live-update is its own thing, probably dial it down a little? Really does make typing at normal speeds, incredibly laggy.
Add more buttons in the detail-edit view for things like aliases, scaleToSeconds and so on. Should be easy to do and better to have UI elements than magical/hidden extra Graphite functions.

bitprophet / descartes