Closed jchauncey closed 7 years ago
will fix #153
I'm testing this as described above but running into a telegraf error:
$ kd logs -f deis-monitor-telegraf-kxpcn
Node Name set (minikube)
Node IP set (192.168.99.100)
Creating topic with URL: http://10.0.0.36:4151/topic/create?topic=metrics
Setting KUBERNETES_URL: http://192.168.99.100:10255
Building config.toml!
Finished building toml...
###########################################
...
# Set Service Input Configuration
[[inputs.nsq_consumer]]
server = "10.0.0.36:4150"
topic = "metrics"
channel = "consumer"
max_in_flight = 100
data_format = "influx"
###########################################
###########################################
/usr/bin/telegraf: line 1: syntax error near unexpected token `<'
/usr/bin/telegraf: line 1: `<?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access denied.</Message><Details>Anonymous users does not have storage.objects.get access to object telegraf/telegraf.</Details></Error>'
It appears telegraf
failed to be downloaded into the container:
root@7f3ec01de15b:/# which telegraf
/usr/bin/telegraf
root@7f3ec01de15b:/# cat /usr/bin/telegraf
<?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access denied.</Message><Details>Anonymous users does not have storage.objects.get access to object telegraf/telegraf.</Details></Error>
root@7f3ec01de15b:/#
The calls in the telegraf Dockerfile should be using curl -fsSL
instead of curl -sSL
so that they return a bad exit code on errors:
curl -sSL https://storage.googleapis.com/telegraf/telegraf ; echo -e "\nExit code: $?"
<?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access denied.</Message><Details>Anonymous users does not have storage.objects.get access to object telegraf/telegraf.</Details></Error>
Exit code: 0
curl -fsSL https://storage.googleapis.com/telegraf/telegraf ; echo -e "\nExit code: $?"
curl: (22) The requested URL returned error: 403 Forbidden
Exit code: 22
Tested as recommended, all dashboards show updating CPU/memory stats and open connections look stable:
$ netstat -tan | grep 10255
tcp 0 0 :::10255 :::* LISTEN
tcp 0 0 ::ffff:192.168.99.100:10255 ::ffff:172.17.0.9:55846 ESTABLISHED
Tags for data collected by the kubernetes changed after the plguin was merged. So we needed to update the dashboards.
Test Steps:
cd telegraf && make build push upgrade
cd ../grafana && make build push upgrade
http://grafana.mydomain.com
login usingadmin/admin
It is also useful to verify that we are not leaking connections with this PR as it will contain the new telegraf binary that contains the fix.
To check for that please ssh onto one of your worker nodes that is running the telegraf daemonset and run the following command
netstat -tan | grep 10255
you should only see 1 or 2 connections open and it should never grow.