emiller42 commented 7 years ago

The delimited namespace is really just a carryover from how metrics are sent to graphite. There isn't any real reason we need to maintain the format.

It may be useful to instead break up the namespace into an array of values, which Splunk will see as a multivalued field. So something like:

production.webserver.loadbalancerA.host.responsetime would turn into something like:

key: [ 'production', 'webserver', 'loadbalancerA', 'host', 'responsetime']

Then in splunk instead of having to search for wildcards which are order dependent (metricName="production.*.responsetime") you could treat them like tags: (key="production" AND key="responsetime")

This might be much more friendly to things like Data Models within Splunk.

emiller42 commented 7 years ago

So splitting on a delimiter is pretty trivial. (foo.split("."))

However, there's a couple ways to approach this on data ingestion, and I'm curious if there are performance considerations:

Option 1

Keep everything the same but split the key. This results in events like the following:

screen shot 2017-02-03 at 10 39 35 pm

So a metric sent with the name foo.bar.set gets turned into "key": ["foo", "bar", "set"] which splunk sees as a field named key{}

Option 2

When sending data to the HEC, you can include a set of arbitrary fields outside of the event which are indexed. Like so:

{
  "time": 1486183538,
  "index": "main",
  "sourcetype": "_json",
  "fields": {
    "arbitrary_field": "foobar",
    "another_field": "barbaz"
  },
  "event": "the indexed event"
}

The contents of fields will be considered indexed fields, but will not show up in the event text. So using the same event above...

screen shot 2017-02-03 at 10 48 03 pm

Notice that metricType and key are indexed, but not part of the event itself. Depending on details of indexing storage and license calculation, this could have an appreciable impact. Because the event payload is JSON data, it's contents is turned into indexed fields anyway. BUT the event text itself is also indexed. Getting the metadata out of the event reduces the duplication of data.

There is also the minor detail of key{} vs key, with the latter looking cleaner.

This does have an impact on discoverability, if users are expecting to see all metadata in the search results.

emiller42 commented 7 years ago

An aside, if we split the metric name by delimiter, we can simply allow users to specify arbitrary keys to add to the array for all events, satisfying #1

We could add configs to append global dimensions, as well as dimensions by metric type.

emiller42 / splunk-statsd-backend

Break up namespace by delimiter #2

Option 1

Option 2