Closed jamieklassen closed 3 years ago
there might be a way to do this more gracefully, by sending metrics to both datadog and wavefront simultaneously for a period of time. Converting to a draft until I can get a better description.
EDIT: before I forget - the general principle was to install an extra opentelemetry contrib collector sidecar into the web pods, built with a small patch to enable the datadog exporter, then have that collector also scrape the prometheus endpoint and forward to datadog, using a config file somewhat like:
receivers:
prometheus:
config:
global:
scrape_interval: '5s'
evaluation_interval: '5s'
scrape_configs:
- job_name: 'concourse'
static_configs:
- targets:
- 'localhost:9391'
exporters:
datadog:
api:
key: '<datadog api key from a secret... so this file should be generated by an init container>'
metrics:
namespace: 'concourse.ci'
service:
pipelines:
metrics:
receivers: [prometheus]
exporters: [datadog]
I lightly tested this and the behaviour is not identical to concourse's native datadog integration -- a key difference is that the extra attributes from the CONCOURSE_METRICS_ATTRIBUTE
parameter do not appear in the prometheus emitter's output, which ultimately means that the data reaching datadog lacks the environment:
label. It may be possible to resolve this using the metrics transform processor available in the contrib collector.
EDIT 2: also @vito pointed out that there may be some non-local effects of having the datadog helm chart installed here -- in particular, it may be responsible for the existence of datadog agents on all the generic-1
nodes, and so hush-house might rely on it in order to export its metrics. So maybe we shouldn't be too hasty about removing the chart.
Interestingly telegraf has exporters for both datadog and wavefront, so I think I will experiment with it now.
EDIT: I'm feeling pretty good about the ability to use prometheus + telegraf + datadog output plugin to get data in datadog that is pretty much on par with concourse's native datadog integration, except that prometheus prefixes all the metric names with concourse_
since it also exposes non-application-specific meters, and the prometheus input plugin adds suffixes for some metrics (e.g. concourse's internal name is builds started
, but prometheus exports it as builds_started_total
). So I think we'll need to make some (hopefully not too difficult) changes to https://github.com/concourse/greenpeace/blob/master/terraform/dashboard/main.tf (sorry for the private repo!) in order for our existing datadog dashboards to work.
OK i took a new approach that I like much better. Not totally sure if it will work, but @xtreme-sameer-vohra you can sanity check it and then we could try deploying and debugging/fixing things up together.
I haven't tested it, however since reverting is quite straightforward, I am leaning towards being expedient and trying it out on CI as per your suggestion.
closing this as metrics of hush-house are now forwarding to wavefront https://vmware.wavefront.com/dashboards/concourse-team-temp
accordingly we should see metrics appearing in wavefront.