Open scott-wood-vgh opened 7 years ago
I've just realized even restarting the collectd service does it. I tried to enable debugging logging and managed to stop another box from pushing just from doing a sudo service collectd restart. The only extra info is that it seems to be trying to push what I expect:
[2016-11-24 13:56:13] [AmazonCloudWatchPlugin][cloudwatch.modules.flusher] [debug] flushing metrics collectd--gauge-Tomcat[5] load--load-[18] df-root-percent_bytes-used[6]
EDIT: Looks like a segfault in the syslog right before some instances stopped working. FWIW I'm on collectd.5-4.0
[7615552.971016] collectd[6891]: segfault at 98 ip 00007f43afd8c721 sp 00007ffe83a72ba0 error 4 in libpython2.7.so.1.0[7f43afc7a000+2dc000]
Hi,
I have somewhat of an urgent and baffling problem in that my collectd and CloudWatch plugin installation will push metrics fine, until the box is restarted. Then, its CloudWatch alarms go into Insufficient Data due to the following error in the collectd logs:
[2016-11-30 19:43:24] [AmazonCloudWatchPlugin][cloudwatch.modules.client.putclient] Could not put metric data using the following endpoint: 'https://monitoring.us-east-1.amazonaws.com/'. [Exception: HTTPSConnectionPool(host='monitoring.us-east-1.amazonaws.com', port=443): Max retries exceeded with url: /?Action=PutMetricData&MetricData.member.1.Dimensions.member.1.Name=Host&MetricData.member.1.Dimensions.member.1.Value=i-ffd27e66&MetricData.member.1.Dimensions.member.2.Name=PluginInstance&MetricData.member.1.Dimensions.member.2.Value=NONE&MetricData.member.1.MetricName=collectd.gauge.Tomcat&MetricData.member.1.StatisticValues.Maximum=1.0&MetricData.member.1.StatisticValues.Minimum=1.0&MetricData.member.1.StatisticValues.SampleCount=6&MetricData.member.1.StatisticValues.Sum=6.0&MetricData.member.1.Timestamp=20161130T194224Z&MetricData.member.2.Dimensions.member.1.Name=Host&MetricData.member.2.Dimensions.member.1.Value=i-ffd27e66&MetricData.member.2.Dimensions.member.2.Name=PluginInstance&MetricData.member.2.Dimensions.member.2.Val [2016-11-30 19:43:24] [AmazonCloudWatchPlugin][cloudwatch.modules.client.putclient] Request details: 'Action=PutMetricData&MetricData.member.1.Dimensions.member.1.Name=Host&MetricData.member.1.Dimensions.member.1.Value=i-ffd27e66&MetricData.member.1.Dimensions.member.2.Name=PluginInstance&MetricData.member.1.Dimensions.member.2.Value=NONE&MetricData.member.1.MetricName=collectd.gauge.Tomcat&MetricData.member.1.StatisticValues.Maximum=1.0&MetricData.member.1.StatisticValues.Minimum=1.0&MetricData.member.1.StatisticValues.SampleCount=6&MetricData.member.1.StatisticValues.Sum=6.0&MetricData.member.1.Timestamp=20161130T194224Z&MetricData.member.2.Dimensions.member.1.Name=Host&MetricData.member.2.Dimensions.member.1.Value=i-ffd27e66&MetricData.member.2.Dimensions.member.2.Name=PluginInstance&MetricData.member.2.Dimensions.member.2.Value=NONE&MetricData.member.2.MetricName=load.load&MetricData.member.2.StatisticValues.Maximum=0.05&MetricData.member.2.StatisticValues.Minimum=0.01&MetricData.member.2.StatisticValues.SampleCount=18&Metric
What could be causing this issue to surface only after reboots? As far as I can tell, all boxes have identical networking, and it can affect any EC2 after I reboot it from the console. Is there maybe some CloudWatch plugin service that does not get started on boot? The collectd service is running fine and I've tried restarting it multiple times.
Thank you.